(746d) Explicit Nonlinear Collective Variables and Biased Molecular Dynamics Using Autoencoders

Authors: 
Ferguson, A. L., University of Illinois at Urbana-Champaign
Chen, W., University of Illinois at Urbana-Champaign
Macromolecular and biomolecular folding landscapes typically contain high free energy barriers that impede efficient sampling of the thermally accessible configurational space by conventional molecular dynamics simulation. Biased sampling approaches (e.g., umbrella sampling, metadynamics, adaptive biasing force) seek to improve sampling by artificially driving the simulation along some collective variables (CVs) to accelerate exploration of configurational space. The success of these methods critically depends on the availability of good CVs that are associated with the important collective dynamical transitions, span the accessible phase space, and do not contain latent free energy barriers. Nonlinear dimensionality reduction techniques (e.g., Laplacian eigenmaps, diffusion maps, Isomap, locally linear embedding) can systematically identify appropriate CVs as nonlinear combinations of molecular degrees of freedom corresponding to the collective molecular motions. A critical impediment to implementing these CVs within biased molecular dynamics simulations is the unavailability of an explicit functional relationship between the CVs and atomic coordinates, which precludes the mapping of biasing forces in the CVs to real-space forces on the atoms. Innovative resolutions of this fundamental difficulty have been proposed by using the CVs to initialize unbiased simulations in undersampled regions1, construct approximate parameterizations of the CVs in basis functions centered on representative molecular configurations2, employ restriction/lifting and coarse-grained projection between the molecular and CV descriptions3, and perform biased sampling in proxy physical variables correlated with the collective CVs4.

In this work, we report the use of autoassociative artificial neural networks (autoencoders) to learn low-dimensional nonlinear subspaces containing the collective dynamical motions of biomolecular simulations and furnish nonlinear CVs that are explicit functions of the atomic coordinates. We have integrated this data-driven CV discovery with umbrella sampling within the molecular simulation package OpenMM5 to perform accelerated sampling directly in the low-dimensional subspace by propagating the CV biasing forces into real-space forces on the atoms. By interleaving successive rounds of CV discovery and biased sampling, we have established an approach to iteratively discover and refine the CVs and efficiently sample the thermally accessible phase space by parsimonious biasing along the important collective molecular motions. We describe applications of our approach to systematically discover good collective variables and efficiently calculate free energy surfaces for alanine dipeptide and the Trp-cage miniprotein.

1. J. Preto and C. Clementi Phys.Chem.Chem.Phys. 16 19181 (2014)

2. B. Hashemian, D. Millán, and M. Arroyoa J. Chem. Phys. 139 214101 (2013)

3. T.A. Frewen, G. Hummer, and I.G. Kevrekidis J. Chem. Phys. 131 134104 (2009)

4. A.L. Ferguson, A.Z. Panagiotopoulos, P.G. Debenedetti and I.G. Kevrekidis J. Chem. Phys. 134 135103 (2011)

5. P. Eastman et al. J. Chem. Theor. Comput. 9 1 461-469 (2013)