The number of sophisticated sensors in the IoT and Industrial Internet worlds has no ceiling in the next few decades. The data infrastructure that most companies have in place today are just not ready to find, access and process the enormous amount of real-time data that emanates from complex machines. We’re talking about machines like a turbine, industrial motor, aircraft, offshore oil rig, autonomous vehicle, or a power utility plant.

One critical business imperative of this decade (and the next) is making good use of data assets, especially those that have “gone dark.” There’s a risk that sensor data, if not properly curated by a data platform, will fall into the void and never be heard from again. That’s a big risk when that aggregated data might teach your engineers something about how to innovate or improve on the next version of whatever they’re making.

We’ve moved beyond the old storage paradigms based on archiving. Today, if engineers can’t access and find files that are essential to a current design process, the business loses time and money. Moreover, with business models shifting to performance-based (“power-by-the-hour”) contracts, data that helps predict equipment and asset performance is critical to the enterprise.

The Digital Dossier

The Digital Dossier is a collection of data streams tied to a serialized product or products. We think of it as a “health record” for industrial things or equipment. It rests on a common data access platform and represents a huge leap forward in connecting the engineering team with “crown-jewel” datasets like geometry, simulation and telemetry. It can also be used in a consumer setting to organize sensor data that might have a common thread (for example, they share a common source or category of sources).

A Digital Dossier is useful in any area where baseline equipment and asset performance need to be understood. For example, it identifies and aggregates all critical information in the enterprise linked to a serialized piece of equipment. It continues to track and organize that data from the day the piece of equipment is created to the day it is taken out of service. That information might include a bill of materials, geometry schematics, simulations, and stored real-time data coming from sensors in the field – but there is really no limitation put on data types or size. Within the Dossier, there could even be a link to a “digital twin,” or a virtual representation of a part or machine tied to a physical counterpart’s sensors. Digital twins are useful in the pre-manufacturing process for testing and for in-the-field analysis of performance.

A comprehensive Digital Dossier is incredibly useful for predictive maintenance initiatives, design processes, and other business-critical applications. Dossiers might be easier to create in 2016 for new assets that have accessible designs and simulation data, but what about the equipment that was deployed 20 years ago? Equipment that’s already in the field might have this associated data buried in the enterprise because it has been shuffled around several times by the IT department after multiple technology refreshes. In some cases, the data’s original creators have left the company and when they leave the knowledge of that data’s location leaves with them. Many customers I work with are surprised to learn that there are mission-critical files from two, five, 10 or even 20 years ago that they can no longer access!

Unfortunately, none of the Digital Dossier’s benefits can be fully realized if a company does not have a solid data access strategy based on a data infrastructure robust enough to deal with petabytes of unstructured data. A data access platform connects data sets that might be physically distant (and sometimes forgotten), and it does so without much pain to the end-user. The best data access platforms have “find” functionality that allows a user to thread a single parameter (e.g. serial number, or aircraft tail number) across many disparate data sets.


A new approach to data access is an architecture that creates a unified namespace for large unstructured datasets. Even data sets that are highly distributed across storage mediums and across the globe can be managed from a single virtual space that does not change over time. This solves the problem of losing track of important data sets because of lost tribal knowledge and tech refresh cycles.

Now that integrating complex sensors into new designs has become a desired goal, it’s not hard to predict that entire business strategies will hinge on what is done with the mountains of data those sensors produce. Winners and losers in the marketplace will be determined by who is able to best use digital assets to create the next design innovation. The real winner is the company that can use unstructured and structured data gathered from an innovative machine’s sensors and create an even better innovation the next time around.

This post originally appeared on