(189b) Big Data Process Modelling with Parallel Graphics

Authors: 
Mahoney, A. W., Process Plant Computing Limited
Brooks, R. W., Process Plant Computing Limited
The scale of data collected across a typical process plant, thousands of points at sub-minute frequencies, for decades, completely swamps engineers’ ability to use conventional analysis and graphs. Among the solutions considered are big data approaches that have produced results in other fields. But process plants present big challenges for big data. Big data techniques are focused on detecting small correlations among a mass of largely random data, while process data has extensive correlations due to mass balance and physical relationships. Predictive analytic techniques based on big data provide generalized answers through simplification, ignoring data by choosing a subset of variables, averaging data, and ignoring fundamental complexities and nonlinearity. This destroys the richness of the data and reinforces preconceptions.

Process engineers are skilled at data reduction. When investigating a problem a typical approach is to place a boundary as small as possible around the suspected equipment, select a few variables known to be key and a few more that might be related to the problem at hand, limiting the approach to 10 to 20 variables of the hundreds that may be available and then focusing in on just some key time periods believed to be key. In this way, only a very small amount of the applicable data is ever used and very little new understanding can be generated. The drawback has been that multi-way interactions have exponential complexity with the number of variables and can take prohibitively long to investigate, understand, and discover new relations that aren’t just the known and expected physical correlations.

The parallel coordinate graph removes this data analysis limitation. By providing a graph that allows viewing continuous data across hundreds of variables simultaneously, engineering analysis and visualization can proceed roughly linearly with the number of variables considered, dramatically increasing the amount of data available to the engineer and brining the limit closer to the actual physical memory limit of the computer system used. Parallel graphs with queries can be used to link and compare operating envelopes from final product quality variables across hundreds of process operating values, allowing discovery and utilization of data not previously considered.

Extending this approach to geometric modelling, over one hundred variables can be used in a single process operating model built from the same historical data. The use of the parallel coordinate plot allows operators to monitor the process and detect changes in the relationships between the variables in real time. Unlike traditional models which are practically limited to 10-20 variables, this model takes account of far more historical data and variable relationships.

As an example an ethylene refrigeration system will be considered. Here one of the operation issues is keeping the system far enough from compressor surge that the anti-surge system doesn’t operate. By including many more variables, a much more sensitive event prediction models can be built that allows more optimal operation while still giving warning of required operator action.