Product(s) used in this publication: Absolutely Quantified Peptides SpikeTides™ TQL
Recent sequencing technologies have highlighted translation of untranslated regions (UTRs) in genomes although it remains unknown whether the translated products persist in a cell. Here we propose a proteogenomic approach to UTR identification at the proteome level, which has been challenging due to the lack of corresponding sequences required for peptide-spectrum matching. We address the challenge with constructing tUTR (translated UTR) database, consisting of all hypothetical sequences that can be translated from UTR by assuming non-AUG initiation at near-cognate start codons and stop codon readthrough. In the analysis of H1299 cell line MS/MS dataset, the tUTR DB-based proteogenomic approach enabled the detection of 52 5'-UTR and 9 3'-UTR peptides from 45 and 9 genes, respectively. The identified UTR peptides were validated via high spectral similarity with their synthetic peptides. The 5'-UTR peptides pointed out alternative initiation sites with non-AUG start codons, which exactly conformed to Kozak contexts of annotated initiation sites. It is also noteworthy that our approach can detect translated amino acid sequences as well as the evidence for UTR translation, while ribosome profiling provides only the translation evidence. For previously reported stop codon readthrough in MDH1 gene, we could confirm the amino acid inserted during the readthrough.