(63s) Comparative Analysis of Molecular Structure Identifiability Based on Signatures and Descriptors | AIChE

(63s) Comparative Analysis of Molecular Structure Identifiability Based on Signatures and Descriptors

Chemoinformatics methodology and application of large data sets of molecular information is becoming integrated into computer chemical engineering process design software. Computer design and/or selection of molecules with target properties from QSAR models  is percived as large scale computational combinatorial problem.  Information on molecule structure and inferences of its properties are mostly based on the following two approaches: molecule structure coding based on graph theory (Faulon et al.1 extended valence) and the chemical molecule based descriptors2. Available are software tools for automatic calculation of chemoinformatic data, but the needed inverse modelling from target propertis to molecule structures is difficult and is still an open problem. Due to lack of systemic formal mathematical properties of chemoinformatic mappings, they are nonlinear, noncontinuous, highly synergetic, hence linear/nonlinear continuous  models lack generalisation and are mostly case limited. Here are applied models based on decision trees/random forest and evaluated are their accuarcy for inverse classification from chemoinformatic data to molecule structures. Here are presented as  test molecules:  alkanes, alkenes, acetones, aromatics, organic acids and halogenated hydrocarbons in the range of C1-C12, and a set of binary ionic liquids (cations: imidazole, pyridinium, quinolinium, ammonium, phosphonium).  The results indicate that molecule descriptors outperform graph based approach  for molecule prediction of properties, but for accuracy of the inverse mapping is favored by the graph extended valances.  

References

1.            Jean-Loup Faulon, Donald P. Visco, Ramdas S. Pophale, The Signature Molecule Descriptor. 1. Using Extended Valence Sequences in QSAR and QSAR Studies J. Chem. Info Comput. Sci. 43, 707-720

2.            Chun Wei Yap, PaDEL-Descriptor: An Open Source Software to Calculate Moleclar Descriptors and Fingerprints, J. Comp. Chem., 32(2010)1466-1474

3.            Bioclipse 2: A scriptable integration platform for the life sciences

Ola Spjuth, Jonathan Alvarsson, Arvid Berg, Martin Eklund, Stefan Kuhn, Carl Mäsak, Gilleain Torrance, Johannes Wagener, Egon L Willighagen, Christoph Steinbeck and Jarl ES Wikberg BMC Bioinformatics 2009, 10:397 doi:10.1186/1471-2105-10-397

4.            R Core Team (2015). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.

Checkout

This paper has an Extended Abstract file available; you must purchase the conference proceedings to access it.

Checkout

Do you already own this?

Pricing

Individuals

AIChE Pro Members $150.00
AIChE Graduate Student Members Free
AIChE Undergraduate Student Members Free
AIChE Explorer Members $225.00
Non-Members $225.00