(305f) Data Science Applications for Process Improvement at Dupont | AIChE

(305f) Data Science Applications for Process Improvement at Dupont

Authors 

DuPont is a Fortune 500 company headquartered in Wilmington, DE (USA) that develops and manufactures materials for protective garments, construction, water purification, imaging, printing, microchip fabrication, and electronic devices. The company seeks to continually improve its manufacturing processes for greater efficiency and quality. An element of this effort has been growing a team to identify, develop, and deploy data science solutions at DuPont plants. This abstract summarizes lessons we’ve learned over the past 5 years and 15 projects.

The most foundational results of our experience are a deep familiarity with the data sources at our plants and a profile of valuable data science applications in manufacturing. A modern manufacturing plant has numerous software systems: a process historian (PH), an enterprise resource planning (ERP) system, a manufacturing execution system (MES), a distributed control system (DCS), and a laboratory information management system (LIMS). Our projects almost always begin with large-scale data queries, aggregations, and joins to compile a master data table. The project objective is often to train models for one or more product properties or process yields of relevance.

The first class of opportunities we’ve identified is predictive modeling. Predictive modeling is best applied to a process step that require a human decision. In this application, a process operator can view or trigger a prediction from a model to guide this process decision. She interfaces with the model through a web application with a graphical interface and a code-based back end that accesses live data from the manufacturing process and performs the model prediction. An example of our predictive modeling applications is a model to predict the weight of monomer required to produce the desired polymer viscosity in an imbalanced stoichiometric polymerization process. The weight of monomer required is variable due to batch-to-batch variations in raw material quality and operator behavior. Process variables such as current viscosity, viscosity history within the batch, reactor temperature, and mixing speed all have predictive value for the viscosity response. Our models in this application have decreased the number of required monomer additions by 20% and batch times by 20%.

The second class of opportunities we’ve identified is root-cause analysis. Root-cause analysis is best paired with a specific product quality issue or equipment failure mode. In this application, we compile a large set of historical data, identify a set of plausible predictor variables in collaboration with process experts, train models, gather descriptive information from the model, and organize in a report. We train the model in this application specifically for the descriptive information it can provide, so we choose methods that supply coefficients or SHAP values. One root-cause analysis of a key product defect in DuPont suggested interventions that reduced the defect incidence by 65%.

Other opportunities we’ve pursued are anomaly detection, fingerprinting from resonance or chromatography spectra, and quality control charts. We rely on traditional data-driven methods (principal component analysis and statistical process control respectively) for these applications instead of machine learning. However, we have success implementing these solutions using the modern data science toolbox of cloud tools, database queries, dashboards, and web applications.

The suite of machine learning methods we evaluate for our projects can vary, but always includes two preferred methods. The first is elastic net1. Elastic net is a straightforward method to train, and the data scientist can easily confirm training success with a simple plot. Because elastic net is a linear model, it produces a set of model coefficients with clear descriptive value. The second is xgboost2, a popular boosted regression tree model. Although more difficult to train than elastic net, it can fit more sophisticated nonlinear and interaction effects with good prediction accuracy, offers a moderate amount of descriptive information, and is less expensive to train than deep learning models.

Data preparation and feature engineering are important steps for developing machine learning models that benefit from domain expertise. We’ve developed a variety of guidelines and strategies for these informed by statistics and chemical engineering. One of the greatest challenges is modeling data produced by batch processes with data-driven methods. Because the state of a batch process depends on its history, we must either engineer features as integrals of process variables or a proposed rate equation or employ a machine learning method that includes a numerical integration, like neural odes3.

We can conclude the seminal by describing how cloud tools for data pipelining, data storage, compute, dashboards, and web applications can comprise a complete data science solution for manufacturing applications.

References

  1. Zou, H. and Hastie, T. (2005), Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67: 301-320.
  2. Chen, Tianqi, and Carlos Guestrin. "Xgboost: A scalable tree boosting system." Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 2016.
  3. Chen, R. T. Q. (2021). torchdiffeq (Version 0.2.2) [Computer software]. https://github.com/rtqichen/torchdiffeq