(502b) Learning Structure Identification in Molecular Simulations with a Pointnet
AIChE Annual Meeting
2019
2019 AIChE Annual Meeting
Topical Conference: Applications of Data Science to Molecules and Materials
Applications of Data Science in Molecular Sciences II
Wednesday, November 13, 2019 - 12:45pm to 1:00pm
Given the widespread success of machine learning in object identification in fields such as computer vision, it seems intuitive that similar approaches might hold promise for structure identification in simulations. In general, the challenge in applying machine learning techniques to simulation is selecting the appropriate input features. This once again becomes a system-specific challenge that can require significant human input and intuition. Our goal is to develop a generic approach to local structure identification in molecular simulations that requires no system-specific feature engineering and operates on the raw outputs of the simulations, i.e., atomic positions.
Our approach is to apply a specific type of neural network called a PointNet to identify local structural environments in molecular simulations. The PointNet takes as input the relative positions (i.e., x, y, z) coordinates of atoms within some cutoff distance of a central atom and is trained to classify the structural environment of the central atom. We demonstrate the method on crystal structure identification in LennardâJones (four different phases), water (eight different phases), and mesophase (six different phases) systems. The method is able to achieve >99.5% accuracy in crystal structure identification. We also demonstrate that the method is applicable to heterogeneous nucleation as it can even predict the crystal phases of atoms near external interfaces. Next, we test if the method can be used to identify if a water molecule is in a hydrophobic or hydrophilic environment based solely upon the surrounding water molecules. The method is able to characterize heterogeneous surfaces (e.g., self-assembled monolayer surfaces with patterns of hydrophobic and hydrophilic regions) and can characterize the local surface hydrophobicity of biomolecules such as proteins.
The PointNet approach to local structure identification is a generic framework for identifying different structural motifs which appear in molecular simulations. The approach only requires the raw output from molecular simulations and requires no system-specific feature engineering. The approach will be broadly applicable to many other types of local structure in simulations, such as biomolecular conformations, active site arrangements, the structure of solvation shells, and more. The method will shorten time-to-discovery by enabling rapid structure identification in novel systems where there are not existing order parameters.