(477l) Leveraging Structure and Property Information for Building Maps of Materials
AIChE Annual Meeting
2020
2020 Virtual AIChE Annual Meeting
Topical Conference: Applications of Data Science to Molecules and Materials
Innovations in Methods of Data Science
Wednesday, November 18, 2020 - 10:15am to 10:30am
The application of machine learning techniques to problems in chemistry and materials science has become increasingly common over the past few years, with many studies focusing on property prediction or structural comparisons based on mathematical representations of materials. This has been made possible by the emergence of large, diverse databases of materials and molecules that allow for statistical analyses and high-throughput searches of candidate materials for a particular application. In many cases, such analyses are facilitated by the construction of a âmapâ that helps visualize the similarities and differences between the structures and properties of materials. Such maps are often based on a reduced structural representation obtained through the use of, for example, principal component analysis (PCA). However, maps constructed in this way do not necessarily reflect the property correlations between different structures. Similarly, the representation that yields the best property predictions may not be the best for highlighting structural relationships between different materials. In this work, we propose a kernel-based extension to Principal Covariates Regression (PCovR) [1], a method that combines PCA and linear regression, to create maps of a variety of materials, including molecular crystals and all-silica zeolites, that strike a balance between predictive ability and emphasizing structural differences. The kernelized extension, which we call Kernel Principal Covariates Regression (KPCovR), allows us to take advantage of nonlinear relationships in the data to build representations of materials that can offer improved regression performance over linear PCovR while preserving or enhancing the apparent structural diversity among the materials studied. Finally, we introduce a feature selection procedure based on PCovR aimed at identifying the features that best represent the structural diversity and predictive ability simultaneously, which makes it possible to achieve more accurate property predictions with a smaller number of features compared to other simple feature selection methods.
[1] S. de Jong, H. A. L. Kiers, Chemomet. Intell. Lab. Sys. 14, 155-164 (1992)