(416a) Global Systems Analysis Using Deep Learning on Industrially Relevant Large Datasets | AIChE

(416a) Global Systems Analysis Using Deep Learning on Industrially Relevant Large Datasets

Authors 

Sin, G. - Presenter, Technical University of Denmark
Aouichaoui, A., Technical University of Denmark
Global sensitivity analysis (GSA) is a thriving field of applied statistics that is used in the assessment of complex simulation models. Among its other uses, GSA is widely employed to generate insights into the contributions of individual model inputs, or sub-groups of inputs, to the variations in the output of a mechanistic model (Saltelli et al., 2019). However, when such models fall short of explaining a particular process phenomenon, sensitivity indices derived from these models become unreliable. One good example is nitrous-oxide emissions (N2O) from wastewater treatment plants (WWTP), for which the mechanistic understanding is still in its infancy(Daelman et al., 2015;Sin and Al., 2021). The recent breakthroughs achieved in deep learning (DL), on the other hand, offer an exciting possibility to bring new light to such poorly understood process phenomena. To this end, here we present a new framework, named deepGSA, incorporating well-established variance-decomposition-based global sensitivity analysis methods, such as Sobol sensitivity indices, with the plant data-driven deep learning modeling techniques.

The deepGSA aims at enabling non-specialist practitioners to leverage deep learning/machine learning models for GSA application purposes. To this end, the tool builds on an earlier GSA framework of the authors, easyGSA (Al et al., 2019), and is based on a recently proposed framework for DL-based and big data-driven process modeling (Hwangbo et al., 2020). By using these two frameworks, the deepGSA streamlines a number of tasks into a deep learning pipeline, such as data cleaning and preparation, model building and discrimination, model validation, Monte Carlo simulations, Sobol sensitivity analysis, Derivate-based global sensitivity analysis and effective visualizations of GSA results. The capabilities of the tool are highlighted with a case study from WWTPs concerning study of nitrogendioxide (N2O) emissions which is a potent greenhouse gas. For that purpose, a one-year-long dataset collected from four of the biological reactors of the Avedøre WWTP of Copenhagen (Denmark) (Chen et al., 2019) was used to train DL-based models. By using the tool, a number of candidate DL network topologies were evaluated, and a DL-model with satisfactory predictive performance (R2test>0.90) was obtained. By using this model in a parallelized Monte Carlo simulations procedure, the Sobol sensitivity indices were calculated to identify underlying factors (process disturbances and conditions) related to the emissions of N2O. Several sensitivity analysis techniques were applied to understand and explain which inputs are driving the greenhouse gas emissions. As regards the variance decomposition methods, the resutls indicate that Sobol total sensitivity index (STi) is an appropriate metrics to compare input/factor importance. Furthermore accounting for dependency among inputs, only influences the main effects (Si) calculations and not the STi. Hence STi is recommended a robust measure for relative importance of the inputs for this particular study. One disadvantage of variance based decomposition methods is that they do not indicate the direction of the effects of the inputs on the outputs (namely positive or negative influence) as Si and STi both are a ratio of the conditional variance of input on the output over total variance of outputs.

Derivative based sensitivity analysis also known as local elementary effects, in reference to Morris elementary effects, can reveal the sign of contributions (negative versus positive) of the inputs on the model outputs which provides a valuable information in engineering/scientific studies. When performed in a global setting (e.g. Sobol, I. M., & Kucherenko, S. (2010)), properties of the distribution of these effects can be used to study and rank importance of factors. In this study, the distribution of the values of the said derivative functions revealed a pattern characterized by signifcantly heavy/fat tails in both negative and positive scales of function values. The normal distribution failed to described the tails, which is often used assumption in the literature. And this presents an interesting challenge as the standard inference statistics can not be used to interpret these results. To do a proper interpretation of the results, we turn to extreme value theory and statistics which studies the tailed distributions. Here the question addressed is how many monte carlo simulations (N) are needed to ensure convergence of such methods? To address these questions M/S plots from applied statistics to study the existence of moments of distributions are studied and discussed. We also study the pareto distribution funciton and its estimation of the tail index using hill estimators. All these analysis indicate/confirm the presence of a heavy tailed distribution. We conclude with a new methodology for performing a global systems analysis of industrial datasets particularly subject to fat tailed distribution of their inputs.

References

Al, R., Behera, C.R., Zubov, A., Gernaey, K. V., Sin, G., 2019. Meta-modeling based efficient global sensitivity analysis for wastewater treatment plants – An application to the BSM2 model. Comput. Chem. Eng. 127, 233–246. https://doi.org/10.1016/j.compchemeng.2019.05.015

Chen, X., Mielczarek, A.T., Habicht, K., Andersen, M.H., Thornberg, D., Sin, G., 2019. Assessment of Full-Scale N 2 O Emission Characteristics and Testing of Control Concepts in an Activated Sludge Wastewater Treatment Plant with Alternating Aerobic and Anoxic Phases. Environ. Sci. Technol. 53, 12485–12494. https://doi.org/10.1021/acs.est.9b04889

Daelman, M.R.J., van Voorthuizen, E.M., van Dongen, U.G.J.M., Volcke, E.I.P., van Loosdrecht, M.C.M., 2015. Seasonal and diurnal variability of N 2 O emissions from a full-scale municipal wastewater treatment plant. Sci. Total Environ. 536, 1–11. https://doi.org/10.1016/j.scitotenv.2015.06.122

Hwangbo, S., Al, R., Sin, G., 2020. An integrated framework for plant data-driven process modeling using deep-learning with Monte-Carlo simulations. Comput. Chem. Eng. 143, 107071. https://doi.org/10.1016/j.compchemeng.2020.107071

Saltelli, A., Aleksankina, K., Becker, W., Fennell, P., Ferretti, F., Holst, N., Li, S., Wu, Q., 2019. Why so many published sensitivity analyses are false: A systematic review of sensitivity analysis practices. Environ. Model. Softw. 114, 29–39. https://doi.org/10.1016/j.envsoft.2019.01.012

Sin, G., & Al, R. (2021). Activated sludge models at the crossroad of artificial intelligence—A perspective on advancing process modeling. npj Clean Water, 4(1), 1-7.