(476h) No More Histograms: Variational and Bayesian Approaches to Estimating Potentials of Mean Force

Shirts, M. R., University of Colorado Boulder
Ferguson, A. L., University of Illinois at Urbana-Champaign
Potentials of mean force, or free energies along a selected set of collective variables, are ubiquitous in molecular simulation. Applications include determining the rate limiting step of a reaction, understanding the behavior of collective interactions such as hydrophobicity, and elucidating them mechanism of transport along molecular pores. Most commonly, the potential of mean force is estimated using a range of different histogramming techniques, most often a type of multiple histogram reweighting technique such as WHAM.

However, the process of histogramming obscures two important points; first, the observed distribution of observations along the desired collective variable or variables is not a histogram, but a series of delta functions, and approximating it as a histogram results in a loss of information. Second, the potential of mean force is (for molecular problems) a continuous function, and representing this function as a histogram is also an approximation.

In this study, we examine how to properly relate the observed empirical distribution to true, infinite sampling probability distribution as a function of collective variables, and hence determine the potential of mean force. We first show how histogramming is essentially a kernel density approximation with a "top hat" kernel, and show how we can generalize this process to other kernels that might be more useful for a given problem.

More powerfully, we present a variational approach, in which finding the continuous probability density function along a collective variable reduces to minimizing the Kullback-Leibler divergence between a trial function and the empirical distribution, as well as a fully Bayesian approach to sampling in the space of trial functions. This Bayesian approach additionally provides several powerful tools, such as determining which trial functions may be most appropriate for the problem and which trial functions are under or overfit. We demonstrate the application of these methods in 1, 2, and 3D model problems.