(164z) A Stochastic Chemical Search Grammar for Macromolecules | AIChE

(164z) A Stochastic Chemical Search Grammar for Macromolecules

Authors 

Lin, T. S., Massachusetts Institute of Technology
Olsen, B., Massachusetts Institute of Technology
Pattern search in small molecule graphs has been crucial to advancing many scientific, engineering, and commercial applications, including database representation and virtual screening, drug design, structure-to-property relationships, reaction prediction, and property prediction. The popular SMILES grammar, or Simplified Molecular-Input Line-Entry System, encodes the exact structural connectivity of a small molecule graph as a single compact linear string. The SMARTS grammar, or SMILES Arbitrary Target Specification, enables users to query these graphs for patterns or substructures. However, polymers do not have exact structural connectivity, with variations in composition, tacticity, monomer arrangement, and length. Thus, BigSMILES line notation was created to extend SMILES to encode polymers as ensembles of molecules with a single linear string. With the creation of BigSMILES and motivated by the importance of pattern search, we design a new grammar for querying macromolecules called BigSMARTS.

BigSMARTS allows the user to query deterministic SMARTS or stochastic patterns in stochastic graphs and search the structural hierarchy of the polymer that makes key contributions to properties, including logical searches of repeat units, topological searches to classify the polymer according to its architecture, and stochastic reaction searches. BigSMARTS will change the way in which stochastic structures are studied by researchers and scientists in industry, academia, and government. Users can chemically search stochastic structures in databases, accessing key experimental and theoretical characterization information, understand how microstructure and topology influence a material’s properties, and design and manipulate stochastic structures with desirable patterns with feasible synthesis routes. To implement this syntax, we adapt and extend RDKit’s popular, user-friendly, and rapid small molecule substructure search subroutines to polymers. With this comprehensive search grammar and implementation scheme, BigSMARTS will advance polymer informatics and materials science in the same way SMARTS has for small molecules.

Topics