(38c) Analysing Big Data in Dairy Processing, By Throwing Most of It Away

Conference

AIChE Spring Meeting and Global Congress on Process Safety

Year

2018

Proceeding

2018 Spring Meeting and 14th Global Congress on Process Safety

Group

Industry 4.0 Topical Conference

Session

Big Data Analytics and Statistics

Time

Monday, April 23, 2018 - 4:30pm to 5:00pm

Authors

Depree, N. - Presenter, University of Auckland

Young, B. R., University of Auckland

Prince-Pike, A., University of Auckland

Wilson, D. I., AUT University

Dairy processing in New Zealand has a desire to reduce out-of-specification production of value-added milk powders. While the process is tightly controlled, and such events are rare, the detection of some faults takes up to three days post-production, and leads to significant downgrades given very high production rates. Grading tests indicate if powders are fit for release, but are not clear measurements useful for data analysis. It is hard to identify the plant operating conditions that lead to failure, and furthermore, the underlying physical mechanisms are not always well understood.

The examination of multiple industrial plants spread over geographically separate sites, all producing similar products requires careful management of significant amounts of data. Ideally this â€œBig Dataâ€ approach would underpin models describing the process, which could be used to predict key functional properties of the milk powder, and deliver early warning signals before product was to slip out of specification. Extensive work was done in the very difficult task of collecting and aligning several years of process and quality data, from plants around New Zealand, representing a range of designs, ages, geographical locations, process control schemes, and data storage types.

However, as recent publications have shown, and this study reinforced, the creation of the dataset is typically a much larger job than the actual modelling and analysis, which at times can seem as trivial as sending the data to a regression function. It is becoming increasingly evident that advanced technical knowledge is not simply waiting to be unlocked by the â€œBig Data Revolutionâ€. Our early approaches led to a range of models that suffered from poor predictability and were of little practical use. It appeared to be a combination of several factors, where measurements to discern the key physical phenomena were not available (or possible), and that there was not enough data in the main regions of interest, due to the rarity of failure.

Whilst adding new instruments is difficult and expensive, substantial improvements in predictability were made, not by collecting even more data, but by â€œthrowing much of it awayâ€. Changes in plant operations, equipment, and products mean that only `small dataâ€™ models can be applied to subsets of the data. It was necessary to go back to first principles and apply detailed operational knowledge to find ways to improve the time alignment of process data and quality samples. Statistical resampling of data sets was applied to give more weight to rare occurrences, as well as using categorical modelling to find these occurrences.

This approach, and the need for fundamental engineering information and careful examination of the physical process was reinforced by investigation of a different big data set for cream cheese production. The data was analysed to give an overview of the behaviour of the plant, but progress in modelling and prediction required significant work and low-level calculations to dissect the data. Rather than multivariate data analysis, this required actual engineering knowledge to understand how the plant is automatically controlled, how the operators physically conducted manual tasks, and the vagaries of how the different instruments measure and are recorded.

This paper will describe two examples of how â€œBig Dataâ€ - that is the collection and preparation of the data - is not a solution to our problems, rather than a tool which still needs careful investigation and engineering knowledge to understand engineering problems. It is the unfortunate situation that often the biggest data still does not contain enough information, either because key information is not measured, or because there is not enough data in the regions we are really interested in. In this case, models that are intended to predict certain failures need to have substantial data that exhibits these failures. We describe this problem and how we were able to address it.

Topics

Food (Chemicals & Materials)

Process Automation & Control

Measurement

Other Sites & Tools

Technical Groups

Technical

Professional/Personal Growth

Societal Needs

Leadership

2024 mRNA Technology Conference

5th Engineering Cosmetics and Consumer Products Conference

Upcoming Conferences & Events

2024 mRNA Technology Conference

5th Engineering Cosmetics and Consumer Products Conference

2024 DIERS Virtual Spring Meeting

2024 Pacific Northwest Student Regional Conference

2024 Western Student Regional Conference

CCPS Middle East Regional Meeting

Hydrogen Fueling Station Safety

Streamlining Permit-to-Work Processes With a Digital Solution

6th Middle East Process Engineering Conference and Exhibition

CEP: April 2024

CEP: March 2024

Explore Areas of Advancement:

Learning Center:

Want to be an Entrepreneur? Personal Stories From Three Successful Entrepreneurs Who Have Traveled This Path.