The SOTA de novo sequencing model.
The state of the art Deep CNN neural network for de novo sequencing of tandem mass spectra, currently works on unmodified HCD spectra of charges 1+ to 4+.
Also, Visit https://www.predfull.com/ to check our previous project on full spectrum prediction
Based on the structure of the residual convolutional networks. Current precision (bin size): 0.1 Th.
After clone this project, you should download the pre-trained model (model.h5
) from zenodo.org and place it into PepNet's folder.
Recommend to install dependency via Anaconda
Sample output looks like:
TITLE | DENOVO | Score | PPM Difference | Positional Score |
---|---|---|---|---|
spectra 1 | LALYCHQLNLCSK | 1.0000 | -3.8919184 | [1.0, 0.9999956, 1.0, 1.0, 1.0, 1.0, 0.99999976, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0] |
spectra 2 | HEELMLGDPCLK | 1.0000 | 4.207922 | [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.99999976, 1.0] |
spectra 3 | AGLVGPEFHEK | 1.0000 | 0.54602236 | [1.0, 1.0, 1.0, 1.0, 1.0, 0.99999917, 1.0, 1.0, 1.0, 1.0, 1.0] |
Simply run:
python denovo.py --input example.mgf --model model.h5 --output example_prediction.tsv
The output file is in MGF format
We provide sample data on zenodo for you to evaluate the sequencing performance. The example.mgf
file on google drive contains ground truth spectra (randomly sampled from NIST Human Synthetic Peptide Spectral Library), while the example.tsv
file contains pre-run predictions.