(502b) Learning Structure Identification in Molecular Simulations with a Pointnet | AIChE

(502b) Learning Structure Identification in Molecular Simulations with a Pointnet

Authors 

Sarupria, S. - Presenter, University of Minnesota, Twin Cities
DeFever, R., Clemson University
Targonski, C., Clemson University
Hall, S., Clemson University
Smith, M., Clemson University
Molecular simulations are applied to study increasingly diverse phenomena in physics, chemistry, biology, and engineering. The raw simulation output (i.e., positions of each particle in the system) contains a wealth of data about the system which can be used to relate microscopic structure to macroscopic observable properties and processes. As such, identifying local structure is a key feature of the analysis of molecular simulations. The most common existing approach to identify local structure is to calculate some geometrical quantity referred to as an order parameter. In some simple cases order parameters are physically intuitive and trivial to develop (e.g., ion-pair distance), however in most cases, order parameter development becomes a much more difficult endeavor (e.g., collective solvent behavior around ions). The difficulty of order parameter development is evidenced by the observation that a new order parameter is often worthy of a publication.

Given the widespread success of machine learning in object identification in fields such as computer vision, it seems intuitive that similar approaches might hold promise for structure identification in simulations. In general, the challenge in applying machine learning techniques to simulation is selecting the appropriate input features. This once again becomes a system-specific challenge that can require significant human input and intuition. Our goal is to develop a generic approach to local structure identification in molecular simulations that requires no system-specific feature engineering and operates on the raw outputs of the simulations, i.e., atomic positions.

Our approach is to apply a specific type of neural network called a PointNet to identify local structural environments in molecular simulations. The PointNet takes as input the relative positions (i.e., x, y, z) coordinates of atoms within some cutoff distance of a central atom and is trained to classify the structural environment of the central atom. We demonstrate the method on crystal structure identification in Lennard—Jones (four different phases), water (eight different phases), and mesophase (six different phases) systems. The method is able to achieve >99.5% accuracy in crystal structure identification. We also demonstrate that the method is applicable to heterogeneous nucleation as it can even predict the crystal phases of atoms near external interfaces. Next, we test if the method can be used to identify if a water molecule is in a hydrophobic or hydrophilic environment based solely upon the surrounding water molecules. The method is able to characterize heterogeneous surfaces (e.g., self-assembled monolayer surfaces with patterns of hydrophobic and hydrophilic regions) and can characterize the local surface hydrophobicity of biomolecules such as proteins.

The PointNet approach to local structure identification is a generic framework for identifying different structural motifs which appear in molecular simulations. The approach only requires the raw output from molecular simulations and requires no system-specific feature engineering. The approach will be broadly applicable to many other types of local structure in simulations, such as biomolecular conformations, active site arrangements, the structure of solvation shells, and more. The method will shorten time-to-discovery by enabling rapid structure identification in novel systems where there are not existing order parameters.

Topics