(186c) Bayesian Model Selection: Applying Parsimony to Build Better Molecular Models | AIChE

(186c) Bayesian Model Selection: Applying Parsimony to Build Better Molecular Models

Authors 

Madin, O. - Presenter, University of Colorado Boulder
Shirts, M., University of Colorado Boulder
Messerly, R. A., National Renewable Energy Laboratory
Boothroyd, S., University of Colorado Boulder
Non-covalent interactions play critical roles in molecular biology. Molecular dynamics is the most common in silico method to probe these interactions, as simple force fields enable the larger system sizes and longer timescales required to capture the larger-scale interactions that drive many biological processes. The speed and power of these models comes at the cost of explicit electronic interactions, which are replaced with simpler approximations; this forces the modeler to make many decisions while selecting the appropriate force field, including choices of models for atom typing, combination rules, polarizability, and charge modeling.

In the Open Force Field Initiative, we aim to develop force fields and their non-covalent interactions, using data-driven techniques. To this end, we explore the use of Bayesian inference to make data-driven choices between non-covalent interaction parameters and functional forms, by calculating Bayes factors, a quantitative measure of the relative evidence between models. Bayes factors incorporate parsimony, penalizing unnecessary complexity naturally through the Bayesian paradigm. More complex models generally have more parameters with less certain values, putting them at a disadvantage against simpler models which are able to reproduce the same quantities of interest. To justify its additional complexity, a model must demonstrate a significant improvement in its accuracy.

We test this strategy on the 2-center Lennard-Jones plus Quadrupole (2CLJQ) model for simple fluids, leveraging analytical surrogate models from literature to rapidly evaluate the large number of parameter sets required to compute Bayes factors. With this framework we are able to evaluate whether including a quadrupole is necessary when designing a model to reproduce temperature-dependent density, saturation pressure, and surface tension data for several small molecules. In most cases, we find that the inclusion of a quadrupole parameter is not justified, indicating that a simpler representation is sufficient to reproduce the quantities of interest. This process also produces parameter probability distributions for each compound, which provides valuable information about the parameter uncertainty and sensitivity. This work demonstrates the utility of Bayesian inference as a tool for model selection and informs our future application of this technique to more complex decisions required in fitting biomolecular force fields.