(2ao) Optimal Design of Soft Matter Via Simulation, Machine Learning and Large Language Models | AIChE

(2ao) Optimal Design of Soft Matter Via Simulation, Machine Learning and Large Language Models

Authors 

Shi, J. - Presenter, University of Notre Dame
Research Interests:

Soft matter, an integral part of our modern existence, permeates various aspects of life, including clothing, food, transportation, construction, and healthcare. One method for quantifying the impact of these essential materials is the global sales, which exceeded $600 billion for polymers and $140 billion for liquid crystal displays. Despite this impact, our current pace of technological innovation, particularly in the development of next-generation soft matter is slow. This slow pace is increasingly problematic given the public health crisis and escalating environmental issues we face, including climate change and plastic waste. It is essential to accelerate the pace of our material innovations, in soft matter and beyond, to meet the pressing demands of our rapidly changing world.

The combination of machine learning and statistical physics-based molecular modeling enables us to explore larger and more complex systems, propelling the discovery and optimal design of novel soft matter. Importantly, large language models, acclaimed for their exceptional performance in text processing, can significantly bolster research in soft matter informatics by facilitating the development of comprehensive and high-quality datasets, which form the foundation of data-driven projects in soft matter. Therefore, my research group aims to investigate the integration of molecular simulations, statistical physics, machine learning, data mining and large language models to accelerate the discovery and optimal design of soft matter. Specifically, my research group will focus on (1) optimal design of complex polymer-surface interaction using simulation and artificial intelligence, (2) integrating modeling and machine learning in optimal design of polymers for sustainability, and (3) accelerating soft matter informatics via large language models.

Research Experience

My research background uniquely positions me to lead a group focused on an integrated framework of physics-based molecular simulation, machine learning, and large language models aiming to address complex challenges in the optimal design of soft matter.

From 2017 to 2022, I conducted my Ph.D. research at the University of Notre Dame with Prof. Jonathan Whitmer, where I employed machine learning techniques, molecular simulation methods, and advanced sampling algorithms to investigate the free energy landscapes of materials and facilitate inverse-design of materials. I used molecular simulations coupled with enhanced sampling to study the temperature dependence of the Frank–Oseen elasticity of liquid crystal in lattice and atomistic models. Moreover, I explored the structure-property relationships inherent in liquid crystals. I also employed first-principle molecular dynamics simulations and advanced sampling methods to study the temperature dependence of gold clusters' stability and isomerization rates, offering a quantitative understanding of these clusters' dynamic structure in catalysis. Additionally, I integrated coarse-grained simulation with data-driven machine learning methods for predicting polymer-surface adhesive free energy and the inverse design of functional polymers. Furthermore, to overcome the limitations posed by small datasets, I utilized transfer learning, which notably enhanced the prediction of polymer-surface adhesion strength. Apart from the Ph.D. in Chemical Engineering, I also obtained a graduate minor in computer science and engineering with systematic training in high-performance computing and machine learning.

Since 2022 spring, I have been a postdoctoral researcher at MIT, working with Prof. Bradley Olsen and Dr. Debra Audus (NIST) on polymer informatics, using optimization, machine learning, deep learning, and large language model techniques, accelerating the development of functional large polymer databases and optimal design of polymers. I utilized earth mover's distance and graph edit distance to quantitatively calculate the pairwise chemical similarity score for polymers, which is largely consistent with the chemical intuition of expert users and is adjustable based on the relative importance of different chemical features for a given similarity problem. This work represents a vital step toward building search engines and quantitative design tools for polymer data. Additionally, to overcome the complexity and slow speed of graph edit distance, I employed graph neural networks to accelerate the pairwise similarity calculation in macromolecule while keeping the accuracy. I am also actively and deeply engaged in various collaborations. I employed large language models to automatically identify literature relevant to block copolymer phase diagrams, facilitating subsequent data extraction. Additionally, I participated in a hackathon focused on the applications of large language models in chemistry and materials science. Given the voluminous corpus of literature, manual conversion from unstructured text to structured data is impractical. We examined the capacity of large language models to automatically transform unstructured texts into structured JSON files related to organic reactions and tested the performance on the Open Reaction Database (ORD), a database of curated organic reactions. I optimized the performance of GPT-4's zero-shot and few-shot prompts, and then conducted a comparative analysis of the accuracy between these optimized GPT-4 prompts and the fine-tuned GPT-3.

Selected Awards:

ACS Polymeric Materials: Science and Engineering (PMSE) Future Faculty Honoree (2023)

Winner, MIT ChemE Teach-Off Competition (2023)

Forum for Early Career Scientists (FECS) Travel grant, American Physical Society March Meeting (2023)

Outstanding Paper Award, University of Notre Dame (2020)

Division of Soft Matter (DSOFT) Travel Grant, American Physical Society March Meeting (2020)

Graduate School's Professional Development Awards, University of Notre Dame (2019,2020)

Best Poster Award, 6th Annual Notre Dame-Purdue Soft Matter & Polymers Symposium, (2019)

Publications:

[1] Jiale Shi, Nathan J. Rebello, Dylan Walsh, Weizhong Zou, Michael Deagen, Bruno Salomao Leao, Debra J. Audus, Bradley D. Olsen. “Quantifying Pairwise Chemical Similarity for Polymers.” Macromolecules. Minor revision.

[2] Kevin Maik Jablonka,...,Jiale Shi,...“14 Examples of How LLMs Can Transform Materials Science and Chemistry: A Reflection on a Large Language Model Hackathon.” Digital Discovery. In review.

[3] Jiale Shi, Dylan Walsh, Xian Gao, Debra J. Audus, Bradley D. Olsen. "Earth Mover's Distance as a Metric for Calculating Polymer Ensemble Pairwise Similarity." In preparation.

[4] Jiale Shi, Debra J. Audus, Bradley D. Olsen. "Graph Neural Network for Efficient and Accurate Macromolecular Similarity Calculation." In preparation.

[5] Jiale Shi, Fahed Albreiki, Yamil J. Colón, Samanvaya Srivastava, Jonathan K. Whitmer. "Using Transfer Learning to Leverage Prior Knowledge in the Prediction of Adhesive Free Energies between Polymers and Surfaces." J. Chem. Theory Comput. 2023.

[6] Jiale Shi, Michael J. Quevillon, Pedro Henrique Amorim Valença, Jonathan K. Whitmer. "Predicting Adhesive Free Energies of Polymer-Surface Interactions with Machine Learning." ACS Appl. Mater. Interfaces, 2022, 14, 32, 37161–37169.

[7] Jiale Shi, Shanghui Huang, François Gygi, Jonathan K. Whitmer. "Free Energy Landscape and Isomerization Rates of Au4 Clusters at Finite Temperature." J. Phys. Chem. A, 2022, 126, 21, 3392-3400.

[8] Jiale Shi*, Hythem Sidky*, Jonathan K. Whitmer. "Automated determination of n-cyanobiphenyl and n-cyanobiphenyl binary mixtures elastic constants in the nematic phase from molecular simulation." Mol. Syst. Des. Eng., 2020, 5, 1131-1136. (* indicates equal contribution and co-first authorship)

[9] Jiale Shi, Hythem Sidky, Jonathan K. Whitmer. "Novel Elastic Response in Twist-bend Nematic Models." Soft Matter, 2019, 15, 8219-8226. (inside front cover)

Teaching Interests:

During my Ph.D. program, I served as a teaching assistant for two undergraduate courses, attended by students from a diverse range of departments and for two graduate-level courses in four semesters. I also had the chance to lead several homework tutorial sessions. As a postdoctoral associate at MIT, I participated in the Kaufman Teaching Certificate Program (KTCP) in the spring of 2023 systematically equipped with essential teaching skills on backward course design with student-centered intended learning outcomes, creating clear and transparent assessments, providing effective student feedback, and cultivating a welcoming classroom environment. Additionally, I obtained the distinguished honor as the winner of the 2023 MIT ChemE Teach-off Competition. This competitive award underscored my strong teaching and communication skills.

My education and academic training have equipped me with the qualifications to instruct a wide range of Chemical Engineering courses at both graduate and undergraduate levels. My research has afforded me a distinctive proficiency in thermodynamics, probability/statistics, statistical mechanics, molecular modeling, machine learning, data mining, and large language models. Given my unique skill set, I am ideally suited to teach ChemE core courses such as thermodynamics, mathematics, physical chemistry, separation, and reaction kinetics, and elective courses such as statistical mechanics, molecular modeling, and machine learning in chemical engineering.