(142k) Using Semi-Supervised Machine Learning to Map the Phase Diagrams of Open Materials Data Sets
AIChE Annual Meeting
2016
2016 AIChE Annual Meeting
Computational Molecular Science and Engineering Forum
Data Mining and Machine Learning in Molecular Sciences I
Monday, November 14, 2016 - 2:48pm to 3:00pm
Such data is critically important to identifying the materials genome, as it creates the linkage between composition, structure and property. Algorithmic approaches to automated phase diagram mapping have been a hot issue in the high-throughput field for a number of years and most studies have focused on an open FeGaPd data set that has been available for about 10 years. Here, we demonstrate a semi-supervised machine learning technique, SS-AutoPhase, which uses a two-step approach to automatically identify phases within structural data sets. In the first step, clustering analysis is used to automatically select a representative sub-set of samples to be manually analyzed by a human expert. In the second step, these labeled samples are used by an AdaBoost classifier to identify the presence of the different phases in the FeGaPd diffraction data. SS-AutoPhase was used to identify the metallographic phases in 278 diffraction patterns from a FeGaPd sputtered composition spread sample. The accuracy of SS-AutoPhase was greater than 82.6% for all phases when 15% of the diffraction patterns were used for training. Furthermore, the predicted phase diagram of SS-AutoPhase was determined and compared to phase labels from a human expert and other algorithmic approaches. This comparison showed that not only did SS-AutoPhase have very high agreement with the expert phase labels, but that it was able to determines and correctly identify a previously unreported phase. Finally, I will report on a first-of-its-kind identification of a novel ferromagnetic shape memory alloy via the data mining of an open materials database.