Peptide sequences derived from MS/MS spectra, whether via database searching, spectral library matching, or de novo sequence analysis, need to be mapped to the reference proteome in order to determine the protein content of the sample being analyzed.
Proteotypic peptides are of special interest in confidently identifying proteoforms as well as for generating assays for targeted experiments such as SRM. Naturally occurring variants in protein sequences exacerbate this mapping issue by increasing the likelihood that a given peptide sequence is shared among different protein forms.
While the nascent PEFF format allows for the representation of such variants, software is needed to efficiently map observed sequences to all possible variants.
ProteoMapper is a set of software tools to perform this mapping.
Technical Note in Journal of Proteome Research : Mendoza L. et.al: J. Proteome Res. DOI: 10.1021/acs.jproteome.8b00544
There are two components to ProteoMapper: an indexer, and a mapper. A protein sequence database in either FASTA or PEFF format must first be indexed by the indexer. Once the index is built, the mapper can quickly and efficiently map all locations of the input peptide sequence(s) to the proteome. Multiple parallel indices are supported, and input can be in the form of a pepXML file, a simple text file with peptide sequences, or a single sequence via the command-line. There are also options to map using wildcards as well as fuzzy mapping (where one or more amino acids and their positions within the peptide sequence are unknown).
More information can be found in our published technical note (see above), in the poster presented at the 66th ASMS Conference in San Diego: Fast and Efficient Mapping of Peptide Sequences and their Variants to Proteome Databases Using Full Inverted Indices, as well as in this early presentation (pptx).
ProteoMapper can be downloaded and run locally (see below).
It will also become a standard component of the Trans-Proteomic Pipeline (TPP) on its next release, version 5.2.0.
An online version of this tool performs a mapping of an input peptide sequence or list of sequences to databases of various species used by PeptideAtlas.
A web services API is also available at PeptideAtlas.
The software is open-source and freely available. It is written in the perl language, and other than that, no other requirements are needed.
The following .zip file contains the indexer, clips (Create Lookup Index of Protein Segments), and the mapper, promast (PROtein MApping and Search Tool).
You can get a full usage statement and options by simply typing the name of the command on the command-line, and hitting [ return ].
Note: depending on your local system set-up, you may need to adjust the first line of each of these programs to point to the correct path to perl.
clips myproteinfile.fastaTo index using a segment size of 6, and excluding PEFF variants:
clips -s 6 -V anotherfile.peff
promast myproteinfile.fasta PEPTIDERTo get original sequence context in output:
promast -c myproteinfile.fasta PEPTIDERRead input from a text file (one peptide per line):
promast myproteinfile.fasta mypeptidelist.txtFuzzy mapping with 3 unknown amino acids, leaving out unmappable sequences, and restricting results to those that have a mass +/- 0.1 Daltons from the input sequence:
promast -f 3 -U -m 0.1 myproteinfile.fasta PEPTIDER