Industry 4.0 for manufacturers is about combining data from many different sources in the factory with new tools like Machine Learning and Artificial Intelligence to create new classes of business value.  It’s about finding the hidden links between complex systems that lead to huge ROIs by increasing throughput, and it’s about improving yield or reducing downtime. Electronics manufacturers generate incredible amounts of data, some of which have been collected and used in their operations for a long time. With Industry 4.0, there is an increasingly validated and evidence-based belief that combining data from existing silos with new sources and modern machine learning tools will yield massive ROIs. Yet, despite their efforts, many organizations have discovered that there are unexpected practical challenges and pitfalls to making this process work at scale. Primarily, this is because the process is often exploratory and you don’t know if any specific effort will pan out. You just know that hidden, somewhere, in the data about your factory are industry-moving insights and you want to find them before your competitors do. Which leads to the question, how quickly can you move?

You know, at very least, you need to:

  1. Collect data about every part of your operation
  2. Apply modern AI/ML tools to find insights
  3. Put the insights into production to achieve ROI
  4. Repeat for the next ROI … and the next.

This sounds like a clear process but it’s missing a couple of stages between steps 1 and 2:

  • Combine parts of that data together in one place
  • Clean and structure the relevant data so that it can be fed into data science tools.

In many cases, the limiting factor for how quickly you can solve problems with data is how quickly you can put together a clean, unified dataset, making these two steps the key bottleneck to tackle. Once you have clean data, it’s straightforward to get value from it. This is where the hidden cost of data silos comes from. The first thing we often hear from new partners is “we have the data, what we need is data science to solve our problem”.  When we dig in though, we tend to find that while they have, indeed, collected a lot of data, it’s stored in many different systems for different purposes.  These are data silos.  It sounds like it should be easy to just “put all the data together” and then do the data science, but it’s not.  It’s a classic case of a problem that’s easy to solve when it’s small but increasingly tricky as it gets larger.

To understand why let’s look at a familiar analogy that SMT experts will recognize: the pick-and-place (PnP) machine. This workhorse of the SMT industry is a modern marvel, capable of placing tens of thousands of components per hour and assembling circuit boards in seconds. In almost all ways, it far surpasses the old way of hand soldering circuit boards. There is, however, one challenge that a modern PnP struggles with: one-off designs with loose components. Say you have a PnP machine that can place 50,000 components per hour and a customer wants you to make a single board with 10 resistors on it. They give you a printout diagram and a little ESD baggie of the resistors. The customer wants to know how long it would take to build this one board with your state-of-the-art PnP machine. As everyone knows, the answer is not 10 resistors / (50k components/hour) = a few seconds. The PnP machine requires two prerequisites before it can begin its work:

  1. The assembly drawing in a machine-readable format that can be used to program the machine
  2. All the components, carefully organized onto reels fed into known locations on the PnP machine in a standard way.

For a single board with loose components, it’s probably faster to just solder it by hand than to spend an hour programming the PnP machine to then spend 1-second placing components on the board. In fact, because your customer provided the components in a little baggie, you couldn’t even use them because it would cost far more to load them onto a reel than to just buy a reel of the same part and use that instead. The key takeaway is that the prerequisites that enable a PnP machine to work so quickly are:

  • a standardized problem description in a machine-readable format it can understand
  • standardized components on carefully configured reels or trays to minimize the component-to-component variability during pickup.

Without those two conditions, a PnP machine can’t work and you’re better off hand soldering it because a human can understand a printed out assembly diagram and our hands can manipulate tweezers to place bulk components from little baggies. Nowadays, it’s not hard to find any component you want to be preloaded onto a reel for a PnP machine. The industry has had many decades to build careful standardization around this process because it’s critical for reliable, high-speed manufacture of all circuit boards. For Industry 4.0, the equivalent to the PnP machine is the machine learning (ML) algorithms such as a Deep Neural Network. Just like the PnP machine, it has the same two prerequisites to function:

  1. a standardized, machine-readable problem description that it can understand
  2. standardized and sanitized data inputs that have very specific regular formats.

The Industry 4.0 equivalent of hand soldering is an expert with Excel and a few SQL databases. Unfortunately, the industry hasn’t had decades to standardize all its data into “data reels” that can be fed into ML tools. In many ways, we are still trying to feed in loose baggies of data and finding we’re not satisfied with the results. Sometimes the baggies are very large, like an MES database, and other times they’re very small, like a log file from a single machine. It’s not the size of the baggie that’s important, it’s the fact that it’s still in a baggie and not on a reel.

Just like the PnP machine, ML tools struggle to deal with tiny variations from one data entry to another, variations humans can perceive as inconsequential. The key limiting factor is that a PnP has no ability to understand how to build a circuit board, it just knows where you told it to move its arm to pick up parts and where it’s supposed to drop them. Similarly, advanced ML tools don’t understand your problem, they just know how to combine lots of latent features from the standardized data you feed them into an answer. This is why data variability is so problematic for ML. It can’t tell the difference between the tiny variations it should ignore (because they’re just noise) and the ones you’re asking it to pull together into a giant ROI. In both cases, if the instructions are not precise or there is too much variability in the inputs, the process breaks down and you’re better off working manually, allowing the human mind to power through the ambiguity.

Because of scale,  the question for the SMT industry now is how to standardize the data sources into data reels that are ready to be fed into the workhorse machine learning tools of Industry 4.0. It’s evident that solving the same problem on a small scale turns out to be very different from solving it on a massive scale. On a small scale, you can deal with edge cases manually. For example, if you have 100 rows in your spreadsheet and 1% have issues, it’s not a problem to just fix that one issue. However, a 1% issue on a spreadsheet of 10 million rows results in 100,000 problems. This is not only expensive and inefficient, but it’s also likely cost-prohibitive. In this light, the real hidden cost of data silos comes into focus. It’s not that you can’t get data out of them or that it’s impossible to combine data from multiple silos to solve a problem, it’s that it’s a nontrivial amount of work to do so, every single time. So, you naturally stop doing it unless you’re very sure it’s going to be worth your while. As a result, many problems never get solved, even if the ROI could be large- because the cost of finding out if you’re right is just too high.

Reducing the cost of unifying data across silos pays dividends throughout the organization because it reduces all data problem-solving barriers. That realization is a key step toward unlocking the factory of the future.

  • Arch has developed a factory data exchange, ArchFX, a radical new utility to manage data and eliminate data silos and their hidden cost. ArchFX empowers manufacturers to profitably capture Industry 4.0 use cases previously thought to be financially unattainable.

Tim Burke is co-founder and Chief Technology Officer of Arch Systems where he works to accelerate Industry 4.0 by standardizing connectivity and data gathering across the factory. He has broad expertise in industrial communication protocols as well as the challenges of working with the diverse set of machines found in electronics and semiconductor factories. Tim’s published work on the device physics of organic photovoltaics has been cited by thousands of researchers.