(286a) Learning How to Predict SWCNT-Recognition DNA Sequences | AIChE

(286a) Learning How to Predict SWCNT-Recognition DNA Sequences

Authors 

Yang, Y. - Presenter, Lehigh University
Zheng, M., National Institute of Standards and Technology
Jagota, A., Lehigh University
In recent years, machine learning in bioinformatics has attracted considerable interest by its ability to transform large amounts of raw sequence data into useful scientific knowledge, without requiring explicit programming instructions. Many bioinformatics questions can be considered as examples of classification problems. In this work, we demonstrate the applicability of machine learning methods to discovery of special DNA sequences that recognize partner single walled carbon nanotubes (SWCNT). DNA/SWCNT hybrids have demonstrated significant potential in bio-applications because of their ability to disperse and sort SWCNTs by their chirality and handedness. Much work has been done to discover recognition sequences which recognize specific chirality of SWCNT, and significant progress has been made in understanding of the underlying structure and thermodynamics of these hybrids. Nevertheless, the success rate for de novo prediction of recognition sequences remains low. In this research, we investigate a new approach to predict recognition sequences using machine learning techniques and test the predicted results using aqueous two phase separation. Multiple sequence encoding methods (position-specific, term-frequency, combined or segmented term frequency vector, and motif-based feature) were used and compared. The transformed features were used to train several classifier algorithms (logistic regression, support vector machine and multilayer perceptron). The models were retrained each time by adding an experimentally tested new set of predicted sequences. Our predictive model showed significant improvement in ability to predict recognition sequences.