(222b) Tunneling Spectroscopy for Sequence and Structural Label Determination in Single DNA and RNA Molecules

Korshoj, L., University of Colorado Boulder
Chatterjee, A., University of Colorado Boulder
Nagpal, P., University of Colorado Boulder
Abel, G. Jr., University of Colorado
In the push for precision medicine, nucleic acid sequencing is vital for compiling the big data libraries for genome characterization and in the clinic for point-of-care diagnostics. Nanoelectronic nucleic acid sequencing can provide an important alternative to traditional sequencing-by-synthesis by reducing sample preparation time, cost, and complexity as a high-throughput next-generation technique with accurate single-molecule identification. However, sample noise and signature overlap due to varying nucleotide conformations continue to prevent high-resolution and accurate sequencing results. We have developed a nanoelectronic method to combat these issues that combines non-perturbative quantum tunneling spectroscopy with machine learning classification algorithms to distinguish the nucleotides adenine (A), guanine (G), cytosine (C), thymine (T), and uracil (U), as well as nucleotide chemical labels. These results from our quantum molecular sequencing (QMSeq) approach not only present a promising path forward for single-molecule DNA sequencing [1], but also a method for single-molecule direct sequencing and structural characterization of RNA [2].

Using tunneling spectroscopy measurements, we probed the molecular orbitals of chemically distinct nucleobases within DNA and RNA macromolecules immobilized with restricted conformational freedom on a chemically-modified surface. From these measurements, theoretical models for quantum tunneling were combined with transition voltage spectroscopy to obtain twelve biophysical parameters unique to the nucleotides within the electronic tunneling junction. The twelve parameters serve as a comprehensive molecular fingerprint for the nucleotides, facilitating their discrimination and identification of structure-dependent chemical labels through machine learning. We show a high accuracy for both nucleotide discrimination (>99.8%) and chemical label identification (>98%) with a relatively modest molecular coverage (35 repeat measurements).

These results have significant implications for the development of robust and accurate high-throughput nanoelectronic DNA and RNA sequencing techniques. Additionally for RNA, we have shown the potential for simultaneous sequencing and structural mapping of single unknown RNA molecules, paving the way for probing the sequence-structure-function relationship within the transcriptome at an unprecedented level of detail.

[1] Korshoj, Afsari, Khan, Chatterjee, Nagpal, Small 13 (11), 1603033 (2017).

[2] Abel, Jr., Korshoj, Otoupal, Chatterjee, Nagpal, Chemical Science 10 (4), 1052-1063 (2019).