Data in the real world, is it ready for science and engineering?

That’s a key question in oil and gas right now and it brought 30 of us from different backgrounds together in a joint PPDM and WADSIH meeting to answer this in one of the many DataScienceWeek events unfolding across Perth.

Firstly, you’ve got to narrow the scope of this question if you want to reach any kind of meaningful answer in a 2.5 hour session. With an audience covering stalwarts through to start-ups, we took analytics as the test case for discussion. But wait, what do we mean by this? ‘Analytics’ covers a range from the descriptive ‘what happened’ (a yes vote from the data managers in the room), through predictive ‘when will it happen next’ (yes from the data analysts) to cognitive ‘what can we learn from it’ (the data scientists).

With an idea of how we plan to use these data, we addressed quality dimensions using parameters that will be familiar to many. Is our data complete, unique, timely, valid, accurate and consistent?  Which of these matters? Long story short… it’s whatever you need to use data confidently to support the business challenge you need to resolve. Which is where the metadata discussion kicked off: what’s the data description you need to understand your data and how do open schemas such as Dublin Core compare to the application-native options we typically use? Many of us in the room had experience of forcing data into various vendor models, then reloading these same data from source as we cycled through products from different vendors – which I think provides the answer! Resolving this is part of the premise behind OSDU which has the potential to be a step-change for how we work with data in our industry.

We took a quick sidestep into what do you do when metadata is not available to support the data you want to use? Risk it and gain insight at lower confidence, or play safe and maybe miss the extra benefit? Again, it depends on what you want to do – with this nicely demonstrated against an O&G value chain starting at Exploration (low quality data, high quality data? Gimmie it all!) and finishing with Decommissioning (how do we make this safe AND convince others that is what we’ve done). All reasonable, but then how do you manage the chain of custody of data to ensure that data is not used for purposes for which it is not fit? Appropriate metadata is the answer.   

Break-out sessions tested these quality dimensions across scenarios including legacy and new G&G data types, healthcare and safety case documentation. Common to all of these was the need to balance an upfront investment in time and effort against uncertain future use cases for the data, with the party realising the benefit not always the same one taking the initial cost hit. Lessons from outside of O&G were particularly acute with the cost of acquiring one data type being a small proportion of the costs of subsequent storage – which is the opposite for our expensive well and seismic data.

With the relative knowns of quality and metadata addressed, we moved onto the unknown biases that hide in data and interpretations, ready to catch you out the moment you forget about them. In an industry that looks for the few formations and structures containing hydrocarbons, we love to eliminate outliers through the use of averages, means and medians. With limited availability of data, training datasets and sampling are not always representative. (Hint: lobby your regulator to open up their data store if they haven’t already done so). Data that fits our interpretation is always ‘right’ and exceptions get excluded… What’s the lesson here? As a data manager, I need to accurately and precisely deliver the data as it was acquired; as a data scientist I need to challenge the data that is presented to me.

A lot of the workshop content was familiar but that’s no criticism – it’s healthy to revise and continually test our views as the business challenges we’re supporting and the technology available to help us are evolving. Jess Kozman did a great job facilitating and there was active participation across the board. However, the number of attendees was on the low side – lingering covid-caution or perhaps signalling the reduced numbers of data professionals still working in oil and gas, and a concern from those with jobs to be taking time out from their role to learn and grow? This must be addressed by an industry with an ever-increasing focus on data to help it respond to external pressures. Improved recovery factors, lower lifting costs and exploration efforts focused on established basins mean we need to access, understand and work our existing and new data extensively. Data professionals and the support we provide have a valuable ‘multiplier’ effect on the efforts of the consumers that we support, whether they are geoscientists, engineers, analysts or algorithms.

Back to the original question – is our data fit for purpose? The only way we can answer that is by inverting the question: to what purpose does our data need to be fit? This requires continued engagement with business and technical consumers of data; awareness, adoption and use of technology; and on-going assessment and extension of our value proposition. We can no longer predict all the ways in which our oil and gas data will be used, but that’s no excuse to not try!

Thanks to Jess Kozman for facilitating, the PPDM West Australia leadership team volunteers for arranging and to CORE Innovation Hub for providing a great workshop facility.

About the author

Neil Constantine is a Business Unit Director and Data specialist at Sword, based in Perth. Neil has 25 years’ experience, working in both operator and service environments. Sword specialises in data, IT and digital support to a number of industries including Energy. For further information or to get in contact, please email neil.constantine@sword-group.com or call directly on +61 (0)426 240702.

linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram