Having a compact yet robust structural-based identifier or representation system is a key enabling factor for efficient sharing and dissemination of research results within the materials community, and such systems lay down the essential foundations for future informatics and data driven research. Although incredible advances have been made for small molecules, the polymer community has struggled in coming up with an efficient representation system. This is because unlike other disciplines in chemistry, polymers do not have a single well-defined chemical structure. Rather, polymers are synthesized through inherently stochastic chemical processes, and the resulting molecules are intrinsically stochastic, represented by ensembles with a distribution of different chemical structures varying in chain sequence, chain length, chain topology, and chain stereochemistry. This difficulty limits the applicability in polymers of any deterministic structurally-based identifier developed for small molecules.
In this work, a new representation system that is capable of handling the stochastic nature of polymers is proposed. The new system is based on and fully compatible with the popular âsimplified molecular-input line-entry systemâ (SMILES), and it aims to provide representations that can be used as indexing identifiers for entries in polymer databases. The ability of this system to represent a huge variety of polymers is demonstrated, including previously challenging features such as networks, branching, complex monomer sequences, blocky topologies, polymer stereochemistry, and even ladder polymers. It is hoped that the proposed system will provide a more effective language for communication and an enabling technology for advances at the interface of polymer and biomaterials with data sciences.