(299e) Solubility Data Mining and Predictive Modeling: AI+ChE
AIChE Annual Meeting
Tuesday, October 31, 2017 - 9:40am to 10:05am
Performing a historical analysis of experimental data using the tools of data scientists has the potential to unlock insights and streamline future development work. Applying these tools to solubility prediction, data was mined from an internal catalog of automated solubility screening reports across dozens of projects. In all, over 64,000 solubility measurements for >700 pharmaceutically relevant organic compounds in pure and mixed solvents at various temperatures have been aggregated from these reports and analyzed. Such a data set allows for the rapid testing of hypotheses related to solvent selection, such as correlations and synergies between solvent pairs and temperature effects.
Access to this large data set enables machine learning approaches to create quantitative structure property relationship models for solubility prediction that show an improvement over the current standard approaches. Deployed web applications can allow any researcher to rapidly calculate solubility predictions by providing only a molecular structure and a single benchmark solubility measurement. This presentation aims to show how the recently available combination of algorithms (random forests, support vector machines, neural networks, etc.), data-oriented programming languages (R, Python), and cloud computing capabilities can enable chemical engineers to better utilize their data for pharmaceutical process development.