(560cl) Exploratory Textual Data Analysis for Understanding the Research Development of Oxygen Reduction Reaction | AIChE

(560cl) Exploratory Textual Data Analysis for Understanding the Research Development of Oxygen Reduction Reaction

Authors 

Li, Z. - Presenter, Virginia Polytechnic Institute and State University
Xin, H., Virginia Tech
The oxygen reduction reaction (ORR) has received growing interest in the last decade both in the computational and experimental studies due to the continuous attention on the sustainable power generation technologies, i.e., proton exchange membrane fuel cells (PEMFCs). Electro catalytic materials with novel structures and compositions have shown great promise for catalyzing the ORR electrochemical reaction and the discovery of less expensive and more abundant catalysts is becoming the central theorem of modern research. Even though an overwhelming amount of unstructured textual articles available today provides us a rich source for the ORR catalysis studies. However, it is very challenging and time-consuming to gain a comprehensive understanding of current research status of ORR catalysts development by manually reading through thousands of scholar articles exhaustively from the website. Therefore, developing an semi-automatic web-based search engine that can be used to rapidly process up to thousands of literatures for valuable insights is attractive.

In this work, we demonstrated a web-based search engine to explore a vast amount of literature that are associated with the topic of oxygen reduction reaction. As the core component, the machine learning algorithms based on the task-specific Natural Language Processing (NLP) technique is the key to the success of the web-based search engine. The search engine stars with an HTML web scraper to systematically extract the unstructured textual data from the available website. Then, a text normalization process is applied to filter out the non-informative components of the articles such as HTML syntax, punctuations, stop words. In next step, a series of pre-trained surrogate models (i.e., categorical topic classifiers, Named-entity recognizer, sentiment analyzer) is implemented to process and highlight the most informative keywords for describing the given article such as the types of catalytic material, research scopes and primary techniques. Besides, some document characteristic information, i.e., publication time, impact factor, is also extracted for further analysis. In the end, we perform extensive exploratory analysis for the extracted keywords associating with the documents to draw some insightful knowledge. The exploratory analysis includes the visualization of the histogram distributions of the investigated catalytic materials, the ranking of the research topics versus years and so on. The analytical summary of our textual search engine has huge potential for guiding future research studies and strategic decisions.