(422f) Systematic Identification of Relevant Order Parameters in Biophysical Systems

Ferguson, A. L., Princeton University
Bravewolf, L. V., Princeton University
Debenedetti, P. G., Princeton University
Panagiotopoulos, A. Z., Princeton University

The systematic determination of thermodynamically
and kinetically meaningful low-dimensional embeddings of high-dimensional
datasets remains an important problem with implications for the visualization,
clustering and coarse-grained simulation of complex dynamical systems. It is
well established that many processes residing in an ostensibly D
dimensional space actually lie on an intrinsic manifold of dimensionality d
<< D. The dimensionality and shape of the manifold is generally
unknown a priori and may be a highly non-linear function of the data. For
example, transition path sampling has been used to demonstrate that the
transitions between the C7eq and Cax conformations of
alanine dipeptide in vacuum are well-characterized by two backbone torsional
angles [1], indicating that the system lies close to a two-dimensional manifold
parameterized by these variables.

The diffusion mapping technique [2-4]
relies on the construction of a Markov matrix describing a random walk over a
data set, where the probability of hopping from one data point to another is
specified by a pairwise similarity metric. In biophysical systems, the negative
exponential of the root mean squared deviation between molecular conformations
is a common choice. The diffusive proximity of two data points is defined as
the probability of reaching one point from the other in a specified number of
applications of the Markov transition matrix. Points that are connected by many,
short pathways have a small diffusive proximity,
whereas those connected by few, long routes will have a large value. For
uniformly sampled datasets over the domain, the eigenvectors of the Markov
matrix are discrete approximations to the corresponding eigenfunctions of the
continuous Laplace-Beltrami operator, which is a generalization of the familiar
Laplacian to arbitrary surfaces and the generator of a continuous diffusion
process on that surface. In the case of non-uniform sampling, the eigenvectors
approximate the eigenfunctions of the Fokker-Planck operator describing a
continuous diffusion process allowing for the presence of potential wells. Mapping
the original data set onto the eigenvectors of the Markov matrix ? the
so-called diffusion mapping ? results in an embedding in which Euclidean
distances between points correspond to their diffusive proximity in the
original space. Subsequent analyses may be conducted to reconstruct the
intrinsic manifold, estimate its dimensionality and interpret the diffusion map
embeddings in the original variables.

In this work, we apply the
diffusion map technique to ideal-gas and solvated n-alkane molecular
dynamics trajectories to systematically identify the ?right? variables with
which to describe the dynamic evolution of these systems and construct
low-dimensional projections of the free energy surface. Our findings suggest
that, consistent with our recent work [5], the chain radius of gyration is the
primary order parameter for the system, with a variable correlated with a
hairpin to globular transition also of significant importance. We have also
conducted long atomistic molecular dynamics simulations of the alanine
dipeptide in explicit solvent and determined the top two eigenvectors to be
correlated with the Φ and Ψ backbone dihedral angles known to
parameterize the free energy landscape. Finally, we introduce novel techniques
to incorporate solvent variables into the diffusion map analysis of these two
systems, in order to move away from a solute centered perspective. Preliminary
results for the hydrocarbon systems suggest that the diffusion map may be able
to capture cavitation as a key variable in hydrophobic collapse as has been
suggested in the literature [6].

  1. Bolhuis, P.G.; Dellago, C.; Chandler, D. Proc. Natl. Acad. Sci. U.S.A. 2000, 97, 11 5877-5882.
  2. Coifman, R.R.; Lafon, S.; Lee, A.B.; Maggioni, M.; Nadler, B.; Warner, F.; Zucker, S.W. Geometric Diffusions as a Tool for Harmonic Analysis and Structure Definition of Data: Diffusion Maps. Proc. Natl. Acad. Sci. U.S.A. 2005, 102, 21, 7426-7431.
  3. Belkin, M.; Niyogi, P. Laplacian Eigenmaps for Dimensionality Reduction and Data Representation. Neural Computation, 2003, 15, 1373-1396.
  4. Nadler, B.; Lafon, S.; Coifman, R.R.; Kevrekidis, I.G. Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker-Planck Operators in Advances in Neural Information Processing Systems; MIT Press: Boston, 2005, 955-962.
  5. Ferguson, A.L.; Debenedetti, P.G.; Panagiotopoulos, A.Z. Solubility and Molecular Conformations of n-Alkane Chains in Water. J. Phys. Chem. B, 2009, 113, 6405-6414.
  6. Miller, T.F.; Vanden-Eijnden, E.; Chandler, D. Solvent Coarse-Graining and the String Method Applied to the Hydrophobic Collapse of a Hydrated Chain. Proc. Natl. Acad. Sci. U.S.A. 2007, 104, 37, 14559-14564.