(139d) Molecular Crystal Structure Prediction with Gator and Genarris | AIChE

(139d) Molecular Crystal Structure Prediction with Gator and Genarris


Marom, N. - Presenter, Carnegie Mellon University
Molecular crystals are bound by dispersion interactions, whose weak nature produces potential energy landscapes with many local minima. Hence, molecular crystals often exhibit polymorphism, whereby the same molecule crystallizes in several structures. Polymorphs may exhibit markedly different physical and chemical properties. Crystal structure prediction is challenging due to the high accuracy required for the small energy differences between polymorphs and the high dimensionality of the configuration space. We present the genetic algorithm (GA) code, GAtor, and its associated structure generation package, Genarris. Both rely on dispersion-inclusive density functional theory (DFT) for geometry relaxations and energy evaluations.

Genarris generates random structures with physical constraints and uses a Harris approximation to construct the electron density of a molecular crystal by superposition of single molecule densities. The DFT energy is then evaluated for the Harris density without performing a self-consistent cycle, enabling fast screening of initial structures with an unbiased first-principles approach. Genarris creates a maximally diverse initial pool of structures by using machine learning for clustering based on structural similarity with respect to a relative coordinate descriptor (RCD) designed for molecular crystals.

GAs rely on the evolutionary principle of survival of the fittest to perform global optimization. GAtor offers a variety of crossover and mutation operators, designed for molecular crystals, to create offspring by combining/ modifying the structural genes of parent structures. GAtor achieves massive parallelization by spawning several GA replicas that run in parallel and read/write to a common pool of structures. GAtor performs evolutionary niching by using machine learning for dynamic clustering on the fly. A cluster-based fitness function is then used to steer the GA to under-sampled low-energy regions of the potential energy landscape. This helps overcome initial pool biases and selection biases (genetic drift).