(87c) Text Data Feature Extraction Via NLP Embeddings Methods: Robustness and Power Assessment

Conference

AIChE Spring Meeting and Global Congress on Process Safety

Year

2023

Proceeding

2023 Spring Meeting and 19th Global Congress on Process Safety

Group

Industry 4.0 Topical Conference

Session

Emerging Technologies in Data Analytics

Time

Tuesday, March 14, 2023 - 2:30pm to 3:00pm

Authors

Castillo, I. - Presenter, Dow Inc.

Strelet, E., University of Coimbra

Wang, Z., Dow Inc.

Peng, Y., The Dow Chemical Co

Rendall, R., University of Coimbra

Chin, S. T., The Dow Chemical Company

Reis, M., University of Coimbra

A large variety of sensors and measurement instruments are available nowadays in Chemical Processing Industries (CPIs). Using this wide spectrum of sensor technology, it is possible to measure or to infer crucial process parameters for monitoring and control purposes [1]â€“[3]. However, the coverage of the relevant process information is still limited. Even with the existing variety of instrumentation available, the coverage of sensing instruments is physically constrained to a sample or a given section / area of the process / reduced set of physical quantities. Also, the pre-existent instrumentation, sometimes is not enough to measure or estimate new parameters of interest or to detect some abnormal phenomena. For example, existent leaks, corrosion, insulation degradation, unplanned events, etc., are not usually possible to measure with existing sensor technology.

Even though the measurement instrumentation diversity is increasing, the sensors are not the only data sources existing in the CPIs databases. The text data provided from reports, alarms, process tags, etc. are potential interesting and diverse sources of information. These data can contain relevant aspects that sensors are not able to capture. Proper handling of process text data can therefore bring more information for process diagnosis, monitoring and control.

With the recent advances in Natural Language Processing (NLP) [4]; new methods are available that allow to extract features from text data beyond simple frequency counting. The semantics, i.e., the meaning of the text can also be codified in a structured numerical feature, which can be used for process analysis. However, the understanding of a given NLP model is still quite complex, and they are essentially used as black-boxes. Additionally, the power and robustness of this kind of models is still not explored in the CPI context. Therefore, we explore several NLP models for text embedding task, in the scope of a real process, in order to perform an exploratory analysis of the information content and potential associated value for process tuning [5]. Dimension reduction [6] and clustering [7] methods were used to assess the methods and derive several robustness and power metrics.

References

[1] C. H. Goh, Â«Representing and reasoning about semantic conflicts in heterogeneous information systemsÂ», Thesis, Massachusetts Institute of Technology, 1997. Acedido: 23 de outubro de 2019. [Em linha]. DisponÃvel em: https://dspace.mit.edu/handle/1721.1/10713

[2] V. Sheokand e V. Singh, Â«Modeling Data Heterogeneity Using Big DataSpace ArchitectureÂ», em Advanced Computing and Communication Technologies, vol. 452, R. K. Choudhary, J. K. Mandal, N. Auluck, e H. A. Nagarajaram, Eds. Singapore: Springer Singapore, 2016, pp. 259â€“268.

[3] M. S. Reis, R. D. Braatz, e L. H. Chiang, Â«Big Data - Challenges and Future Research DirectionsÂ», Chemical Engineering Progress, n.^o Special Issue on Big Data(March), pp. 46â€“50, 2016.

[4] D. Antons, E. GrÃ¼nwald, P. Cichy, T. O. Salge, e T. O. Salge, Â«The application of text mining methods in innovation research: current state, evolution patterns, and development prioritiesÂ», R & D Management, vol. 50, n.^o 3, pp. 329â€“351, jun. 2020, doi: 10.1111/radm.12408.

[5] K. Lu, A. Grover, P. Abbeel, e I. Mordatch, Â«Pretrained Transformers as Universal Computation EnginesÂ». arXiv, 30 de junho de 2021. Acedido: 1 de setembro de 2022. [Em linha]. DisponÃvel em: http://arxiv.org/abs/2103.05247

[6] L. McInnes, J. Healy, e J. Melville, Â«UMAP: Uniform Manifold Approximation and Projection for Dimension ReductionÂ», arXiv:1802.03426 [cs, stat], 2018, Acedido: 12 de outubro de 2020. [Em linha]. DisponÃvel em: http://arxiv.org/abs/1802.03426

[7] L. McInnes e J. Healy, Â«Accelerated Hierarchical Density ClusteringÂ», em 2017 IEEE International Conference on Data Mining Workshops (ICDMW), nov. 2017, pp. 33â€“42. doi: 10.1109/ICDMW.2017.12.

Topics

New Research Areas

Other Sites & Tools

Technical Groups

Technical

Professional/Personal Growth

Societal Needs

Leadership

6th Middle East Process Engineering Conference and Exhibition

Quantum Computing and Artificial Intelligence Applications Workshop

Upcoming Conferences & Events

6th Middle East Process Engineering Conference and Exhibition

Quantum Computing and Artificial Intelligence Applications Workshop

2024 Offshore Technology Conference

Fourth AIChE Middle East Regional Chem-E-Car Competition

Statistical Modeling of Multivariate Process Parameters

RAPID Roadmap Workshop: Technology Valuation

The Future of AI

2024 Center for Hydrogen Safety Americas Conference

World Digital Congress of Chemical and Biochemical Engineering

CEP: April 2024

CEP: March 2024

Explore Areas of Advancement:

Learning Center:

Want to be an Entrepreneur? Personal Stories From Three Successful Entrepreneurs Who Have Traveled This Path.

(87c) Text Data Feature Extraction Via NLP Embeddings Methods: Robustness and Power Assessment

AIChE Spring Meeting and Global Congress on Process Safety

2023

2023 Spring Meeting and 19th Global Congress on Process Safety

Industry 4.0 Topical Conference

Emerging Technologies in Data Analytics

Tuesday, March 14, 2023 - 2:30pm to 3:00pm

Authors

Topics

More Conference Links

Cancelation Policy

Register

Accommodations

Ethylene Producers' Conference

Code of Conduct

Beware of Hotel and Attendee-list Scams