(299e) Solubility Data Mining and Predictive Modeling: AI+ChE

Conference

AIChE Annual Meeting

Year

2017

Proceeding

2017 Annual Meeting

Group

Pharmaceutical Discovery, Development and Manufacturing Forum

Session

Model Based Integrated Design of Pharmaceutical Drug Substance Processes I

Time

Tuesday, October 31, 2017 - 9:40am to 10:05am

Authors

Albrecht, J. - Presenter, Bristol-Myers Squibb

Qiu, J., Bristol-Myers Squibb Co.

In the synthesis of pharmaceutical drug substances, selecting the optimal solvent system as early as possible is critical to develop efficient, commercially viable processes. Process safety, environmental impact, and yield are driven in large part by the choice of solvent system. Yet early in process development, material availability limits the amount of experimental data that can be collected. Models to predict solubility enable more focused experimentation, but with current approaches there are trade-offs associated with accuracy, solute characterization, and accessibility for process development engineers.

Performing a historical analysis of experimental data using the tools of data scientists has the potential to unlock insights and streamline future development work. Applying these tools to solubility prediction, data was mined from an internal catalog of automated solubility screening reports across dozens of projects. In all, over 64,000 solubility measurements for >700 pharmaceutically relevant organic compounds in pure and mixed solvents at various temperatures have been aggregated from these reports and analyzed. Such a data set allows for the rapid testing of hypotheses related to solvent selection, such as correlations and synergies between solvent pairs and temperature effects.

Access to this large data set enables machine learning approaches to create quantitative structure property relationship models for solubility prediction that show an improvement over the current standard approaches. Deployed web applications can allow any researcher to rapidly calculate solubility predictions by providing only a molecular structure and a single benchmark solubility measurement. This presentation aims to show how the recently available combination of algorithms (random forests, support vector machines, neural networks, etc.), data-oriented programming languages (R, Python), and cloud computing capabilities can enable chemical engineers to better utilize their data for pharmaceutical process development.

Topics

Pharmaceutical Development

Computing and Systems Engineering

Other Sites & Tools

Technical Groups

Technical

Professional/Personal Growth

Societal Needs

Leadership

2024 mRNA Technology Conference

5th Engineering Cosmetics and Consumer Products Conference

Upcoming Conferences & Events

2024 mRNA Technology Conference

5th Engineering Cosmetics and Consumer Products Conference

2024 DIERS Virtual Spring Meeting

2024 Pacific Northwest Student Regional Conference

2024 Western Student Regional Conference

CCPS Middle East Regional Meeting

Hydrogen Fueling Station Safety

Streamlining Permit-to-Work Processes With a Digital Solution

6th Middle East Process Engineering Conference and Exhibition

CEP: April 2024

CEP: March 2024

Explore Areas of Advancement:

Learning Center:

Want to be an Entrepreneur? Personal Stories From Three Successful Entrepreneurs Who Have Traveled This Path.