With over 30 billion connected devices continuously producing gigabytes of data per second, most businesses don’t worry about having enough data, yet many struggle to unlock its value.
To start, how do you define data optimization?
Data optimization isn’t just about creating more data. It’s more about organizing existing disparate data sources into a single source of truth. Many companies invest in industrial systems that produce huge amounts of data. For example, energy blocks of batteries deployed in the field can generate 2000-3000 data points at 1Hz (one cycle per second), all of which needs to be captured. A production-level battery farm can generate 10’s of gigabytes of data per day at this rate of capture. This data, however, remains largely inaccessible and fragmented, trapped behind organizational and systemic boundaries. Optimization is about freeing that data in order to create significant new capabilities across the enterprise.
Why is this so important for industrial data?
In a word, speed. It doesn’t matter which industry you are working in; the speed of innovation has never been faster. A company that doesn’t embrace data optimization will simply get left behind, while their competitors will develop new products faster, targeting entirely new markets. For example, product development teams with access to R&D, manufacturing and field telemetry data are able to reduce development time and manufacturing complexity, and increase product robustness by designing out common field failures. They can only do this with ready access to the relevant data.
What kind of pitfalls have you seen in trying to aggregate your data into a single source of truth?
One of the biggest issues our customers face is the accuracy and completeness of their existing or historical data. Many believe their data is in good shape, but when they begin pulling it together, they start to see the holes and inaccuracies. Missing data can happen for several reasons — for example, multiple process revisions or new processes being introduced. Each process revision may require a different set of parameters to be captured. If the older data isn’t transformed to fit the new version, then it loses value and in some cases is lost entirely.
Some examples of data inaccuracies we’ve seen are inconsistently naming a single chemical rather than using the CAS (Chemical Abstracts Service) identifier, and using a ‘range’ rather than a ‘float’. These inaccuracies can require significant effort to correct.
Once data has been aggregated into a single source, we’ve often found that several further rounds of engagement with the customer may be required to tweak the ingested data. Object consolidation, names, unnoticed data inaccuracies, and object serialization schemes may all require some further grooming. Similar to resolving earlier data inaccuracies, transforming the data after ingestion requires close dialogue and participation from the customer (and all teams who manage the data) and can take months depending upon the complexity of the required changes.
How can we overcome these challenges?
If you’re looking to move along the digitization path:
Take an inventory of existing data, noting various file formats, locations and any inconsistencies that may have to be fixed later on.
Describe the relationships between the various pieces of data. In performing this exercise you may start to see associations that weren’t initially thought about.
Think about what’s wrong with the way the data is currently stored and accessed. Determine what you would like to achieve.
Once the data has been digitized, think about what other operational, quality, or performance questions you would like the data to answer.
What’s the #1 lesson you learned in completing this process for your customers?
Do not short-change the data acquisition tasks. Data acquisition should be treated as a key part of the process, as this will seed all future data creation. It is better to spend extra time up front ensuring that the data model chosen will not only work in the near term, but will also allow for future growth to accommodate expanding product lines, business acquisitions or changing processes.
What kind of outcomes have you seen from customers who have successfully completed data optimization?
You will see three immediate benefits:
Consistency: creating a single source of truth enables data consistency and improved data access for all teams.
Accuracy: once data has been fit to a model, it can be validated against the expected input type and either rejected if outside the norm or flagged as an anomaly that requires further investigation.
Data recall and data insights: using a query interface against a consistent view of data can provide insights that were previously impossible. For example, analyzing past experiments to give greater understanding of why the experiment failed, or highlight parts of the experiment which outperformed the expected results.