(218j) Developing a Defeatured Atom-Additive Model to Predict Single Component Partition Coefficients with FT-ICR MS Data
AIChE Annual Meeting
2022
2022 Annual Meeting
Topical Conference: Applications of Data Science to Molecules and Materials
Applications of Data Science in Molecular Sciences II
Monday, November 14, 2022 - 5:45pm to 6:00pm
In this work, we developed a number of machine-learned models (linear regression, random forest, gradient boosted, etc.) that predict single component partition coefficients based on the data available through FT-ICR MS. By using web scraping methods, a database of 25,970 data points, with 5,514 unique molecular formulas, were collected along with their experimental partition coefficient value. The data was regressed using multiple techniques and found that partition coefficients could be determined on minimal information. Using an independent validation set of nearly 4,000 compounds, our model can produce a mean absolute error of 0.37. Combining this new regression algorithm with FT-ICR MS of complex oil-water systems provides insights into the molecular makeup and partitioning signatures of complex oils.