Tandem mass spectrum technique has emerged as one of the most effective technique for protein sequence identification. Both the database-searching technique and de novo technique suffer from the low accuracy in theoretical spectrum prediction. Theoretically speaking, the spectra datatbase technique escapes from this difficulty; however, the limited size of known spectra in a spectra database usually leads to failure when searching for a query spectrum. ..The study aims to circumvent this difficulty via using "spectra dictionary" tehcnique. Specifically, we first investigate the conservation of framentation pattern for a peptide segments, and gather the segments with convered fragmentation patterns to yield a spectra dicitionary. Of course, there is still possibility that a segment was not archived in the dicitionary. For these segments, a statistical model is proposed to predict their fragmentaton pattern according to the "mobile proton" hypothesis. Finally, the query spectrum will be annotated with peptide segment candidates via searching the dictionary; the full-length peptide sequence will be combined through these segment candidates. .Preliminary experimental results suggest that: 1) peptide segments usually demonstrate converved fragmentatin pattern; 2) nearly all spectra can be explained even using a small-size spectrum dicitionary; 3) for peptide segments, the statical model can predict theoreical spectrum with higher accuracy relative to full-length peptdie. .The study helps to improve the spectra database accruracy, and further extend its applicaions.
基于质谱技术的序列鉴定,是蛋白质组学的重要工具。现有的序列库搜索技术与de novo技术,受限于理论谱预测的精度;谱库技术理论上能避免理论谱预测的困难,但谱库收录数目的有限性往往导致查询失败。.本课题采用"质谱词典"策略以克服上述困难。我们首先研究肽段片段断裂模式的保守性,将具有保守断裂模式的肽段片段收录于质谱词典;其次,对于未收录的肽段片段,依据"移动质子"假说,构建统计模型以预测其局部理论谱;最后,对于待查询质谱,先依据质谱词典中断裂模式标注出其可能的肽段片段,进而将各个标注组合成完整肽段。.此策略的优势在于:即使待查询质谱作为一个整体未收录于谱库中,其局部质谱仍有可能已收录于肽段片段的词典中。.初步结果表明:肽段短片段具有较强的断裂模式保守性;小规模的肽段片段词典即可标注绝大部分质谱;统计模型能够高精度地预测出肽段片段的理论谱。本项研究有助于提高谱库方法的准确性,扩展其应用范围。
基于质谱技术的序列鉴定,是蛋白质组学的重要工具。现有的序列库搜索技术与de novo技术,受限于理论谱预测的精度;谱库技术理论上能避免理论谱预测的困难,但谱库收录数目的有限性往往导致查询失败。.本课题采用“质谱词典”策略克服上述困难。工作中,我们通过对已有海量实验质谱数据进行分析,得到肽段片段断裂模式的保守性,将具有保守断裂模式的肽段片段收录于质谱词典,已收集563万条不同的肽段片段;由于化学特性,有些片段在已有的质谱数据中并未出现,对于这些肽段片段,利用我们的理论谱预测软件msSimulator构建其理论谱;最后,对于待查询质谱,先依据质谱词典中断裂模式标注出其可能的肽段片段,进而将各个标注组合成完整肽段。此策略的优势在于:即使待查询质谱作为一个整体未收录于谱库中,其局部质谱仍有可能已收录于肽段片段的词典中。研究结果:肽段短片段具有较强的断裂模式保守性;小规模的肽段片段词典即可标注绝大部分质谱;统计模型能够高精度地预测出肽段片段的理论谱。 本工作中,共发表论文10篇,其中SCI 7 篇,发布软件OpenMS-Simulator,本项研究提高了蛋白质鉴定的准确性,推动蛋白质组学的发展。
{{i.achievement_title}}
数据更新时间:2023-05-31
DeoR家族转录因子PsrB调控黏质沙雷氏菌合成灵菌红素
基于LASSO-SVMR模型城市生活需水量的预测
基于多模态信息特征融合的犯罪预测算法研究
双吸离心泵压力脉动特性数值模拟及试验研究
空气电晕放电发展过程的特征发射光谱分析与放电识别
基于深度学习的质谱库高速搜索技术研究
基于质谱多反应监测技术构建乳腺癌血清GAGs表达谱库及标志物的发现
基于多组学先验信息的串联质谱数据库搜索方法研究及应用
统一设计谱理论与长周期地震动危险区设计谱研究