Traditional applied statistics, machine learning, artificial intelligence, empirical modeling – they all share a common task: preprocessing data. And every practitioner I’ve ever met has the same warm and fuzzy feelings. It’s grunt work. It’s boring. It’s tedious, frustrating, maddening and consumes 50% or more of the time spent doing analysis. There is no glory in data preprocessing.
The range of issues is broad: extracting data from storage systems, dealing with missing and incorrect values, selecting the appropriate set of variables for analysis, segmenting data for machine learning, merging data across systems. The list goes on. When our best and brightest minds are slogging through this, it feels to me like we are hooking racehorses to hay carts. Solving this problem offers tremendous rewards as we free up those bright minds to focus on more financially rewarding tasks.
As luck would have it, we’re at a point where technology can, and is, having an impact on streamlining this most despised workflow. There is no cookie-cutter approach to preprocessing, as domain knowledge is critical to producing effective data. Artificial intelligence, when embedded in the software used by statisticians, process engineers, data scientists and others, merges domain knowledge with analysis, eliminating the drudgery, speeding up the process and up-leveling user effectiveness.
The latest version of our predictive maintenance solution has an example of this type of innovation. A set of capabilities uses AI to help produce better datasets for training Aspen Mtell® machine learning agents. The AI embedded in the solution is focused on tasks unique to machine learning. Training these models requires many of the same preprocessing elements like eliminating bad or missing values but has unique elements like segmenting the data into training and testing regions and creating the holdback data for final validation. Newly released Aspen Maestro™ automation reduces the time, effort and frustration in data preprocessing, while making non-data scientists more effective. It’s routinely cutting the data wrangling time in half!
Beyond data preprocessing, the embedded AI helps create specialized machine learning that address more complex problems like multiple failure modes that share causes or multiple operating states that result in similar outcomes. One of the most powerful ways it does that is by incorporating domain knowledge in the form of first principles equations that describe complex phenomena like foaming in an amine column or long-cycle problems like hydrate formation. This capability is especially vital in situations with sparse data, and the equations can help fill in the gaps.
Machine learning can be significantly improved by incorporating feature engineering where data are combined in ways that represent fundamental engineering concepts that cannot be violated in the model. For example, if the delta pressure between two points is a key concept, we can create a pseudo variable for that differential pressure rather than including both pressure variables and letting the learning algorithm sort it out. While it’s a powerful tool, it can take some iteration to select the most effective features. Maestro automates feature selection resulting in less iteration and better performing Agents. It’s the tip of the iceberg in terms of what’s possible by embedding AI in engineering tools as a way for current staff to take advantage of the power of data science.
Learn more about Aspen Maestro and other V12 enhancements in this video.