TY - JOUR
T1 - Proteogenomic analysis of Mycobacterium tuberculosis by high resolution mass spectrometry
AU - Kelkar, Dhanashree S.
AU - Kumar, Dhirendra
AU - Kumar, Praveen
AU - Balakrishnan, Lavanya
AU - Muthusamy, Babylakshmi
AU - Yadav, Amit Kumar
AU - Shrivastava, Priyanka
AU - Marimuthu, Arivusudar
AU - Anand, Sridhar
AU - Sundaram, Hema
AU - Kingsbury, Reena
AU - Harsha, H. C.
AU - Nair, Bipin
AU - Prasad, T. S.Keshava
AU - Chauhan, Devendra Singh
AU - Katoch, Kiran
AU - Katoch, Vishwa Mohan
AU - Kumar, Prahlad
AU - Chaerkady, Raghothama
AU - Ramachandran, Srinivasan
AU - Dash, Debasis
AU - Pandey, Akhilesh
PY - 2011/12
Y1 - 2011/12
N2 - The genome sequencing of H37Rv strain of Mycobacterium tuberculosis was completed in 1998 followed by the whole genome sequencing of a clinical isolate, CDC1551 in 2002. Since then, the genomic sequences of a number of other strains have become available making it one of the better studied pathogenic bacterial species at the genomic level. However, annotation of its genome remains challenging because of high GC content and dissimilarity to other model prokaryotes. To this end, we carried out an in-depth proteogenomic analysis of the M. tuberculosis H37Rv strain using Fourier transform mass spectrometry with high resolution at both MS and tandem MS levels. In all, we identified 3176 proteins from Mycobacterium tuberculosis representing ∼80% of its total predicted gene count. In addition to protein database search, we carried out a genome database search, which led to identification of ∼250 novel peptides. Based on these novel genome search-specific peptides, we discovered 41 novel protein coding genes in the H37Rv genome. Using peptide evidence and alternative gene prediction tools, we also corrected 79 gene models. Finally, mass spectrometric data from N terminus-derived peptides confirmed 727 existing annotations for translational start sites while correcting those for 33 proteins. We report creation of a high confidence set of protein coding regions in Mycobacterium tuberculosis genome obtained by high resolution tandem mass-spectrometry at both precursor and fragment detection steps for the first time. This proteogenomic approach should be generally applicable to other organisms whose genomes have already been sequenced for obtaining a more accurate catalogue of protein-coding genes.
AB - The genome sequencing of H37Rv strain of Mycobacterium tuberculosis was completed in 1998 followed by the whole genome sequencing of a clinical isolate, CDC1551 in 2002. Since then, the genomic sequences of a number of other strains have become available making it one of the better studied pathogenic bacterial species at the genomic level. However, annotation of its genome remains challenging because of high GC content and dissimilarity to other model prokaryotes. To this end, we carried out an in-depth proteogenomic analysis of the M. tuberculosis H37Rv strain using Fourier transform mass spectrometry with high resolution at both MS and tandem MS levels. In all, we identified 3176 proteins from Mycobacterium tuberculosis representing ∼80% of its total predicted gene count. In addition to protein database search, we carried out a genome database search, which led to identification of ∼250 novel peptides. Based on these novel genome search-specific peptides, we discovered 41 novel protein coding genes in the H37Rv genome. Using peptide evidence and alternative gene prediction tools, we also corrected 79 gene models. Finally, mass spectrometric data from N terminus-derived peptides confirmed 727 existing annotations for translational start sites while correcting those for 33 proteins. We report creation of a high confidence set of protein coding regions in Mycobacterium tuberculosis genome obtained by high resolution tandem mass-spectrometry at both precursor and fragment detection steps for the first time. This proteogenomic approach should be generally applicable to other organisms whose genomes have already been sequenced for obtaining a more accurate catalogue of protein-coding genes.
UR - http://www.scopus.com/inward/record.url?scp=83055173265&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=83055173265&partnerID=8YFLogxK
M3 - Article
C2 - 21969609
AN - SCOPUS:83055173265
SN - 1535-9476
VL - 10
JO - Molecular and Cellular Proteomics
JF - Molecular and Cellular Proteomics
IS - 12
ER -