I’ve noticed that all vendors of predictive maintenance products and services make the same claims when it comes to preventing breakdowns. So, how can you determine what’s real and what’s not? In a series of blog posts, I’ll point out what’s important when assessing the correct solution for manufacturing organizations.
First and Foremost: Accuracy, Early Warnings and False Alarms
Almost all predictive maintenance solutions use only one mechanism for detecting failure — anomaly detection. Aspen Mtell® uses two mechanisms: anomaly and precise failure detection. Where anomaly detection can work, precise failure detection works earlier and more accurately and is essential for the best performing predictive maintenance solution.
However, the quality of data processing is paramount for a full, effective solution. You’ve heard the adage, “garbage in, garbage out,” which is problematic for many solutions. A number of factors contribute to solution competence. First is data preparation — how the data are conditioned to ensure only the good and all the good data enter the solution set for analytical analysis. During the process of resolving normal behavior using archived data, some leading solutions throw away up to 70% of incoming data, not knowing if it’s good or bad data.
Examining Outliers in the Data
A user can visually identify simple outliers by examining all the trend lines for one machine on a multivariable chart, which may have 50 to 100 individual sensor trends. Data occurring around an outlier could be good or bad. For example, if the large spike in wind turbine gearbox torque occurred when a gear tooth sheared off, the “outlier” was actually a real failure and very valuable information. The simplest but most error-prone method uses a rubber-band cursor to “stripe” and remove all the data from all the sensor trends in an arbitrarily chosen time period before and after the outlier. This occurs without the user understanding that the removed data actually includes good data. Consequently, the resulting reduced training data seriously limits accuracy, increases false alarms, and decreases lead time warnings of impending failures.
Superior solutions condition all archived training data algorithmically for a true understanding of the quality of the combination of all data streams at all times. The process amalgamates all the data streams (say 50-100) into a single line “probability waveform” that shows the probability of failure at every point in time through the training data. Normal behavior in such a chart is the trend line hovering very close to zero. Where behavior deviates from normal conditions, spurious motion in the trend line shows it clearly.
By introducing the timestamps of failure work orders, a user can precisely match the behavior excursions to actual failures and measure the exact time-to-failure and remove them for separate processing. Some remaining excursions do not match failures show errant, perhaps unexplained manufacturing process conditions that compromise good data. Removing these ensures the calculation of normal behavior is not compromised by errant process behavior.
So, now you can see that intelligent striping across the pre-processed data in the probability waveform is very different to striping raw data feeds: intelligent striping is far more accurate and retains all the good data. Removing unexplained process deviations and real failures assures remaining data are all near zero probability of failure and are all meaningful and usable representations of normal behavior. The probability process removes all the guesses and estimates offered in alternative solutions.
Also, because this exercise detects the precise failure patterns for a particular work order, the patterns may be easily linked to the exact root cause; the cause code in the enterprise asset management (EAM) system. Next time the alert is dispatched, you know exactly WHEN failure will happen and HOW it will happen. You can include WHAT action to take to prevent failure, whether that is adjusting a process to avoid degradation or scheduling service before failure occurs. That insight and the early warning assures you the time to act safely under full control.
Predicting Precise Failures
Most solutions do not execute precise failure detection — they only perform what’s called anomaly detection. Simply put, anomaly detection inspects incoming data, asking, “Is this normal?” Obviously, accuracy depends on the precision of normal behavior definition; poor definition means many false alarms, especially during early application commissioning when inferior predictive maintenance solutions simply block errant alarms without improving detection accuracy. In fact, such endless indiscriminate “tuning” of alarm suppression thresholds desensitizes the solution so much that it will suppress real alerts.
While anomaly detection is actually a good technique to detect deviations from normal, it is not as early or as accurate as precise failure pattern detection. It is extremely useful to indicate and send alarms for abnormal deviations. The intrinsic issue is how to decide if the deviation is an actual failure or a change in the process and what to do in either case.
Consequently, after an anomaly alarm most solutions require intense scrutiny by experts to understand if the alarm reflects a new normal or a real failure — a process that may take days or weeks. Worse, those solutions have no simple mechanism to adjust or change the data processing to stop the incumbent false alarms happening again. Manufacturing processes always change over time. You can get process drift, changes in ambient temperatures, new product and process campaigns, winter and summer modes, and so on. Select an accurate early warning solution where you can simply and easily maintain accuracy: e.g., a simple point-and-click ensures the application stays consistent with a changing manufacturing process and dispatches minimal false alarms.