(215d) A Hybrid Methodology For Peptide Identification Via Mixed-Integer Linear Optimization, Local Alignment Database Search And Tandem Mass Spectrometry | AIChE

(215d) A Hybrid Methodology For Peptide Identification Via Mixed-Integer Linear Optimization, Local Alignment Database Search And Tandem Mass Spectrometry

Authors 

DiMaggio, P. A. Jr. - Presenter, Princeton University
Floudas, C. A. - Presenter, Princeton University


Peptide and protein identification is of fundamental importance in the study of proteomics. Tandem mass spectrometry (MS/MS) coupled with high performance liquid chromatography (HPLC) has emerged as a powerful protocol for high-throughput and high sensitivity peptide and protein identification experiments. In recognition of the extensive amount of sequence information embedded in a single mass spectrum, tandem MS has served as an impetus for the recent development of numerous computational approaches formulated to sequence peptides robustly and efficiently with particular emphasis on the integration of these algorithms into a high throughput computational framework for proteomics. The two most frequent computational approaches reported in literature are (a) de novo and (b) database search methods, both of which can utilize deterministic, probabilistic and/or stochastic solution techniques. De novo methods have distinct advantages over database methods in that they can analyze peptides not present in a protein database and are more amenable to identifying post-translational modifications. Recent work has been done to combine the strengths of de novo and database techniques to improve peptide identification accuracy [1-4]. A typical framework utilizes the de novo sequences to generate large ?sequence tags? which are used to query a protein database. Such a framework has the advantage of verifying statistically insignificant database identifications and/or resolving residue ambiguities in the de novo predictions.

We have recently developed a novel mixed-integer linear optimization (MILP) approach to efficiently address the de novo peptide identification problem so as to form a basis for a high-throughput computational framework for peptide identification [6, 7]. This framework is denoted as PILOT, which stands for Peptide identification via Integer Linear Optimization and Tandem mass spectrometry. The overall algorithm PILOT is comprised of: (1) a preprocessing algorithm used to identify certain peaks and to validate boundary conditions, (2) a two-stage mixed-integer linear optimization framework to address missing ion peaks due to residue-dependent fragmentation characteristics, and (3) a post-processing technique for selecting the most probable sequence by cross-correlating the theoretical spectra of the candidate sequences with the experimental tandem mass spectrum.

In this work, we propose a hybrid methodology which utilizes the rank-ordered list of de novo predictions provided by PILOT to query the non-redundant protein database using FASTA [7]. High-confidence residues from the de novo predictions are identified based on peak intensities, existence of complementary ions and conservation of subsequences over all possible candidate sequences. A modified BLOSUM scoring matrix is constructed in order to bias exact residue matches in the alignment with an additional award for high-confidence residues in the query sequences. The computational burden associated with performing several sequence alignment calculations is circumvented by use of parallel programming. The individual run time for each sequence alignments is also reduced by reformatting the non-redundant protein database. Results for the hybrid methodology for peptide identification will be presented on experimentally validated quadrupole time-of-flight (QTOF) tandem MS [8], a test set of annotated OrbiTrap tandem MS, and a bench mark set of ion trap tandem MS [9].

[1] J. Taylor and R. Johnson. Sequence database searches via de novo peptide sequencing by tandem mass spectrometry. Rapid Commun. Mass Spectrom., 11: 1067-1075, 1997.

[2] A. Shevchenko, S. Sunyaev, A. Loboda, A. Shevchenko, P. Bork, W. Ens, and K. Standing. Charting the proteomes of organisms with unsequenced genomes by MALDI-quadrupole time-of-flight mass spectrometry and BLAST homology searching. Anal. Chem., 73: 1917-1926, 2001.

[3] N. Wielsch, H. Thomas, V. Surendranath, P. Waridel, A. Frank, P. Pevzner, and A. Shevchenko. Rapid validation of protein identifications with borderline statistical confidence via de novo sequencing and MS BLAST searches. J. Proteome Res., 5: 2448-2456, 2006.

[4] A. Frank, M. Savitski, M. Nielsen, R. Zubarev, and P. Pevzner. De novo peptide sequencing and identification with precision mass spectrometry. J. Proteome Res., 6: 114-123, 2007.

[5] P.A. DiMaggio and C.A. Floudas. A mixed-integer optimization framework for de novo peptide identification. AIChE Journal, 53(1), 160-173 (2007).

[6] P.A. DiMaggio and C.A. Floudas. De novo peptide identification via tandem mass spectrometry and integer linear optimization. Anal. Chem., 79, 1433-1446 (2007).

[7] W. Pearson and D. Lipman. Improved tools for biological sequence comparison. PNAS, 85:2444-2448, 1988.

[8] B. Ma, , K.Z. Zhang, C. Hendrie, C. Liang, M. Li, A. Doherty-Kirby, and G. Lajoie. PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. Rapid Commun. Mass Spectrom., 17:2337-2342, 2003.

[9] A. Frank and P. Pevzner. Pepnovo: De novo peptide sequencing via probabilistic network modeling. Anal. Chem., 77(4):964-973, 2005.