Preparing Manufacturing PLC IoT Data for Exploratory Analysis

IoT data from PLCs is the key to finding insights on the manufacturing floor. These sensors are data creating machines (pun intended). They hold valuable information on the entire manufacturing process. Their insights hold the secrets to increased efficiency, decreased downtime, and improved business operations.

In this discussion, I will walk you through some key data preparation steps I take whenever I'm analyzing PLC data.


Step 1: The Right Tools

Manufacturing data is challenging to analyze. Process engineers and analysts struggle when looking at PLC data. This is especially true if you are analyzing multiple lines or plants of PLC data and merging it with other sources.

If you work for a manufacturer, chances are you have tried Excel, a BI tool, or some custom code to analyze this data. You know the downfalls: You maxed out Excel. You didn't have the statistical formulas you needed in your BI tool. You don't have the necessary coding skills to analyze the data the way you want.

I strongly recommend analyzing PLC / IoT data with data science, machine learning, advanced analysis, or statistical tooling.

Our favorite tool for this type of analysis is Dataiku. It allows for both visual and code based data preparation. It also allows you to scale your analysis to machine learning and other advanced analysis in the same tool. It's extremely powerful. 

The screenshots in this blog will be from Dataiku, but the steps remain the same no matter what tool you use. These data preparation activities will save you a lot of headaches in the future.


Step 2: Data Preparation

As a Citizen Data Scientist, I get to be the first to analyze this extremely valuable data. Here are some basic Preparation Recipe steps I start with when looking at this type of data:

Column Definitions - What does this mean?

PLC data, and IoT data in general, often arrives with columns that aren’t clear to the end user.  It's important to define what each column means. In the example below, replacing “A354” with “Temperature” will alleviate confusion later in the analysis.  Make sure to include how this is measured (degrees Fahrenheit, lbs, seconds) in the column name. dataiku column rename


Another column name that often requires clarity is  "Date". Date of what? Clearly identify dates/times such as “Washer Start Time” instead of “Date.”  


Confirm the data origin - What machine is this data from?

If there are multiple lines or systems in the analysis, clearly identify which sensors are a part of each system and line in the column name.  Many sensors may appear to be the same name, but are located on different lines.  If you are focusing on a single system or equipment part, this is an opportune time to use a Split or Filter Recipe to remove the unnecessary data. If using a Filter Recipe, adjust the sampling method to be “no Sampling(whole data)”.

dataiku sample setting


Review the control limits for extreme outliers - Is this irrelevant or telling?

Whether or not there are existing control limits, review your data for outliers.  Review these outliers with the business to determine if the machine was outside of the limits or if the data isn’t relevant such as maintenance time or warm-up time. 

Pro tip: Ask for hard limits to identify data that may need to be removed. Such as “Is it possible for this machine to make more than X parts per hour?” to identify data that may need to be removed.

picture of data outliers

Review for known issues - What are we missing?

Scheduled downtime is common in machine data. Review typical plant closure times (evenings / weekends) with the business. Pay special attention for large gaps in the data. It could indicate longer closures such as vacations or maintenance weeks.  Depending on the process, data may not be consistently collected or time periods of missing data that need to be reviewed. Document these findings.
missing data blocks


Known formulas - How do you monitor today?

Many metrics regularly monitored in PLC IoT data are a combination of data inputs in an equation. In the example below, Efficiency is the "Run_Hours" variable divided by the "Total_Hours." By creating this formula, we are documenting valuable business formulas and creating new features (aka data points) from existing data points.

In Dataiku, create new formulas as a new step in a Prep Recipe. Work with the business or quality control to identify what metrics they are using to monitor the health of the process.  

efficiency formulas



If you want to unlock the insights hidden in your PLC IoT data, focus on data cleaning first.  Working closely with machine operators and the business, you can make data driven decisions that help increase efficiency, reduce energy costs, and reduce scrap.

In this post, we’ve covered the type of tools you need and the basic preparation steps to take when analyzing PLC IOT data. These steps will set you up for success in preparing your data for everything from descriptive statistics to machine learning.