One of the biggest data challenges for the battery manufacturing industry is the sheer volume of units going through their doors. Let’s look at the numbers. As of 2020, the US had about 1 GWh of grid-connected battery based energy storage with typically 200–250 MWh new installations every year¹. Quarterly EV sales are around 150,000 in the US alone² with vehicle storage typically in the range of 40 kWh (2021 base model Nissan Leaf) to 95 kWh (2022 long-range model Tesla S). Often these systems are built up from very small units, like the 2170 battery which has a capacity of a bit more than 20 Wh, with typically 4,000–7,500 of these small units per car.  So as an order of magnitude number (with the simplifying approximation that everything is some flavor of Li-ion, which it isn’t), we are looking at 3 billion batteries for vehicles and around 15 million batteries for grid storage, in just the United States.

Many business leaders we speak to are eager to apply software analytics to be able to introduce transformative improvements, such as predictive capabilities powered by machine learning. But before this can happen, there is an important process of what I call data stewardship. As manufacturing ramps up, often very quickly, this becomes even more important, and preferably is something that should happen near the inception of a manufacturing line. Let’s take a closer look at what this involves.


Deciding what data to collect

When working with a battery customer, we have to decide collectively what data we actually want to capture.  For some operations, it’s not economical to try to capture everything, and even if we do decide that, what does “everything” really mean? Do we, for instance, configure our manufacturing line with tomography that generates a 3D image of each cell? Do we capture the results of every formation program for every cell or just sample a few?  What tools should we use to characterize our weld between the battery lead and electrodes and should those images be captured? The list of considerations goes on.

A tomography scan is one type of data that can be collected during battery manufacturing.


Evaluating the data’s utility

Once we decide on the appropriate amount of data to capture, we configure the analytics software to collect the maximum amount of available data, while still allowing for those cases where we choose not to collect any. For example, we might look at tomography or the infrared imager on the electrode weld. The data’s utility is then continuously evaluated with a streaming analytics approach where the software performs anomaly detection using the standard data we gather from the manufacturing line, such as the output of the PLCs and battery test equipment. On top of this, we perform feature extraction and anomaly detection. Comparing the two allows us to evaluate the utility of the overall approach.

We talked specifically about what goes into preparing a data pipeline for anomaly detection in a previous issue of Battery Smarts, including the important work of determining the origin of the data, transforming it using processors and executors, and determining the ideal location for the newly scrubbed data for easiest access. Because the analytics software is set up to stream the datasets through the analytics pipeline while still threading the results down to the same serial number, questions about the overall data utility can be answered with rigor.


Applying analytics to the data

Data science takes center stage at this point, now that your data is optimized and in an easily accessible format. In manufacturing, analytics solutions can involve a number of areas but we’ll focus here on quality and predictive maintenance.

All of our battery and energy customers usually look first to apply analytics to the manufacturing process by maximizing throughput and minimizing variability.  To start, we need standard metrics like throughput, yield and variation of the steps which make up a customer’s process. These must be readily available to view in the software in real time and as a report. Reviewing this data can often provide insights into what impacts certain materials or process steps are having on overall efficiency. For example, is there a manual step requiring data input that can be automated using sensors?

Data filled directly from sensors on the manufacturing line. Automation ensures data hygiene so relevant data is captured and continuously updated.


The global industrial sensors market size reached USD 19.48 Billion in 2020, and is expected to register a CAGR of 9.1% by 2028³. As sensor data becomes more pervasive, it is invaluable in detecting anomalies (using models such as PCA-T2, one-class SVM, autoencoders, and logistic regression), and diagnosing failure modes (using SVM, random forest, decision trees, and neural networks). Analytics can also be extremely useful in predicting time to failure (TTF), which involves using a combination of techniques including survival analysis, lagging, curve fitting and regression analysis⁴. We covered manufacturing quality in depth in a previous issue. The combination of a deep understanding of analytics combined with an understanding of the battery industry is a rare combination and often not practically fulfilled by someone in-house (see our previous issue for more discussion on this topic).

A more interesting challenge is whether the software can help us to consider recommendations for process improvements. These recommendations may be straightforward, such as “today we are limited by step three, but if we purchase two more tools for $350,000 each, we will increase annual throughput by two million batteries.” Or they may be a bit more subtle. For instance, we had a customer who had an air quality problem in their facility. Linking the manufacturing software with external data sources, such as air quality monitoring, allowed for real-time yield to be correlated with these external data streams to improve the process.

This insight wasn’t necessarily challenging from a mathematical perspective ― no convolutional neural networks were involved, for example. But improvements followed by making important decisions about what data to collect, normalizing it into a readily accessible format, and practicing good data hygiene by collecting, threading and visualizing at scale with the bottom-line business KPI’s top of mind.

¹ Data Source: Center for Sustainable Systems, University of Michigan. 2020. “U.S. Grid Energy Storage Factsheet.” Pub. No. CSS15-17
² Data Source: California Energy Commission (2021)