DataOps: The Fuel Injectors For Your Transformation Engine?

Data – everyone agrees it’s the fuel for the fires of innovation and optimisation. The industrial world is blessed with an abundance of rich, objective (being machine-generated) data, so should be well-equipped to seek new advantages from it. Too often, the first efforts an industrial firm takes to harness its machine and process data for new reporting or advanced analysis initiatives involve simple use cases and outputs that can mask what it takes to support a mix of different needs in a scalable and supportable way. Data Ops practices provide a way of systemically addressing the steps needed to ensure that your data can be made available in the right places, at the right times, in the right formats for all the initiatives you’re pursuing.


Industrial data (or OT data) poses particular challenges that your Data Ops strategy will address:

  • It can be generated at a pace that challenges traditional enterprise (or even cloud layer) data collection and data management systems (TO say nothing of the costs of ingestion and processing during reporting/analysis typical of cloud platforms is considered).
  • The data for functionality identical assets or processes is often not generated in a consistent structure and schema.
  • OT data generally does not have context established around each data point – making it difficult to understand what it represents, let alone the meaning inherent in the actual values!
  • Connecting to a mix of asset types with different automation types and communications protocols is often necessary to get a complete data set relevant to the reporting or analytics you’re pursuing.
  • A wide array of uses demands different levels of granularity of some data points and a breadth of collection points that is significantly wider than many individual stakeholders may appreciate.

These are the reasons why in many firms, the engineering team often ends up becoming the “data extract/Excel team” – their familiarity with the underlying technology means they can take snapshots and do the data cleansing necessary to make the data useful. But that’s not scalable, and data massaging is a far less impactful use of their time – they should be engaged with the broader team interpreting and acting on the data!


Data Ops – Quick Definition There’s no one way to “do” Data Ops. In the industrial world, it’s best thought of as a process involving: – Determining the preferred structures and descriptions (models) for OT data, so it may serve the uses the organisation has determined will be valuable. – Assessing what approaches to adding such models can be adopted by your organisation. – Choosing the mix of tools needed to add model structures to a combination of existing and new data sources. – Establishing the procedure to ensure that model definitions don’t become “stale” if business needs change. – Establishing the procedures to ensure that new data sources, or changing data sources are brought into the model-based framework promptly.


A Rough Map is Better Than No Map.

Take a first pass at capturing all the intended uses of your OT data. What KPIS, what reports, what integration points, and what analytics are people looking for? Flesh out those user interests with an understanding of what can feed into them:

  1. Map the different stakeholder’s data needs in terms of how much they come from common sources, and how many needs represent aggregations, calculations or other manipulations of the same raw data.
  2. Flesh out the map by defining the regularity with which data needs to flow to suit the different use cases. Are some uses based on by-shift, or daily views of some data? Are other users based on feeding data in real-time between systems to trigger events or actions?
  3. Now consider what data could usefully be “wrapped around” raw OT data to make it easier for the meaning and context of that data to be available for all. Assess what value can come from:
    1. Common descriptive models for assets and processes – a “Form Fill & Seal Machine” with variables like “Speed” and “Web Width” (etc.) is a far easier construct for many people to work with then a database presenting a collection of rows reflecting machines’ logical addresses with a small library of cryptically structured variables associated to each one.
    2. An enterprise model to help understand the locations and uses of assets and processes. The ISA-95 standard offers some useful guidance in establish such a model.
    3. Additional reference data to flesh out the descriptive and enterprise models. (eg: Things like make and model of common asset types with many vendors; or information about a location such as latitude or elevation). Be guided by what kind of additional data would be helpful in comparing/contrasting/investigating differences in outcomes that need to be addressed.
  4. Now assess what data pools are accumulating already – and how much context is accumulating in those pools. Can you re-use existing investments to support these new efforts, rather than creating a parallel set of solutions?
  5. Finally, inventory the OT in use where potentially useful data is generated, but not captured or stored; particularly note connectivity options.

Avoiding A Common Trap “Data for Analytics” means different things at different stages. A data scientist looking to extract new insights from OT data may need very large data sets in the data centre or cloud, where they can apply machine learning or other “big data” tools to a problem. A process optimisation team deploying a real-time analytic engine to make minute-by-minute use of the outputs of the data scientists’ work may only need small samples across a subset of data point for their part of the work. Data Ops thinking will help you ensure that both of these needs are met appropriately.


Map’s Done – Now How About Going Somewhere?

The work that comes next is really the “Ops” part of Data Ops – with the rough map of different uses of OT data at hand, and the view of whether each use needs granular data, aggregated data, calculated derivations (like KPIs), or some kind of combination, you’ll be able to quickly determine where generating desired outputs requires new data pools or streams, or where existing ones can be used. And for both, your data modelling work will guide what structures and descriptive data need to be incorporated.

At this point, you may find that some existing data pools lend themselves to having asset and descriptive models wrapped around the raw data at the data store level – ie: centrally. It’s a capability offered in data platform solutions like GE’s Proficy Historian. This approach can make more sense than extracting data sets simply to add model data and then re-writing the results to a fresh data store. Typically, streaming/real-time sources offer more choice in how best to handle adding the model around the raw data – and there are solutions like HighByte’s Intelligence Hub, that allow the model information to be added at the “edge” – the point where the data is captured in the first place. With the model definitions included at this point, you can set up multiple output streams – some feeding more in-the-moment views or integration points, some feeding data stores. In both cases, the model data having been imposed at the edge makes it easier for the ultimate user of the data to understand the context and the meaning of what’s in the stream.



Edge Tools vs Central Realistically, it’s you’re likely to need both. And the driving factor will not necessarily be technical. Edge works better when: 1. You have a team that deal well with spreading standardised templates. 2. Data sources are subject to less frequent change (utility assets are a good example of this). 3. The use cases require relatively straightforward “wrapping” of raw data with model information. 4. Central works well when. 5. The skills and disciplines to manage templates across many edge data collection footprints are scarce. 6. The mix of ultimate uses of the data are more complex – requiring more calculations or derivations or modelling of relationships between different types of data sources. 7. Change in underlying data sources is frequent enough that some level of dedicated and/or systematized change detection and remediation is needed.


Regardless of which tools are applied, the model definitions defined earlier, applied consistently, ensure that different reports, calculations and integration tools can be developed more easily, and adapted more easily as individual data sources under the models are inevitably tweaked, upgraded or replaced – as the new automation or sensors come in, their unique data structures simply need to be bound to the models representing them, and the “consumer” of their outputs will continue to work. So, while tools will be needed, ultimately the most valuable part of “doing” Data Ops is the thinking that goes into deciding what needs to be wrapped around raw data for it to become the fuel for your digital journey.