(51i) Cript: Establishing and Harnessing a Polymer Database | AIChE

(51i) Cript: Establishing and Harnessing a Polymer Database

Authors 

Lin, T. S., Massachusetts Institute of Technology
Borg, C., Citrine Informatics
Hegde, V., MIT
Kroenlein, K., Citrine Informatics
Jensen, K. F., Massachusetts Institute of Technology
Olsen, B., Massachusetts Institute of Technology
Having accessible well-structured data is the foundation of cheminformatics. The complexity of polymer structures poses significant challenges in the formation of databases as there is no single representation that can capture the full molecular detail of a polymer material. More specifically, polymers are large stochastic molecules with distributions in chain length, composition, and topology. Additionally, data collection methods are highly variable, typically provide relative structural information (ex. molecular weight relative to a polystyrene standard), and/or use models which require expert knowledge to put into context. In some cases, experimentally obtaining structural information is impossible, and information from prior processing steps is needed. To complicate matters further, polymers can assemble into a wide range of morphologies through phenomena like phase segregation and crystallization. The formation of these morphologies can be highly influenced by the processing condition under which the material was made. Ultimately, data sets that do not completely capture all the relevant polymer, process, and characterization information pose challenges for advancing data-driven research in the polymer field.

Here, we highlight the efforts of the CRIPT initiative to develop a digital ecosystem for polymer data and the underlying innovation that is making it possible. The CRIPT ecosystem consists of a universal polymer data model, open-source community driven database, machine learning tools, and supporting software. The universal data model provides a unique graph-based representation that captures the full scope of polymer data, which includes synthesis, processing, chemical characterization, physical characterization, and simulation data in a single uniform structure. To support the data model, an open-source online app adhering to FAIR data principles (findable, accessible, interoperable, and reusable) is under development which will serve as a community resource for polymeric material design by allowing researchers to both: add their own data sets and retrieve any other data on the database. To facilitate the population of the database, NLP (natural language processing) techniques are being developed to extract data from prior literature, which will complement the addition of new polymer data by active users. The CRIPT database hopes to provide polymer researchers with access to ‘big data’ to increase the community’s ability to develop better materials faster.

Topics