基于数据驱动的蛋白质三级结构预测算法研究

基本信息

批准号：11871290

项目类别：面上项目

资助金额：52.00

负责人：杨建益

学科分类：

依托单位：南开大学

批准年份：2018

结题年份：2022

起止时间：2019-01-01 - 2022-12-31

项目状态：已结题

项目参与者：武琦,张兆鹏,董润泽,潘硕,孙赛赛,宿鸿

关键词：

蛋白质折叠蛋白质结构预测算法评估（CASP）蛋白质结构预测蛋白质结构数据库二维残基接触图

结项摘要

Protein tertiary structure prediction aims to model the protein’s tertiary structure from its amino acid sequence. The prediction accuracy for the existing algorithms is very low for proteins that lack homologous templates (called hard targets). With the continuous development of experimental techniques in the last decade, the numbers of protein sequences and structures are increasing rapidly. In this project, we propose to solve the structure prediction problem for hard targets by developing data-driven protein tertiary structure prediction algorithms, which will make full use of the existing rich resources of protein sequence and structure data. This project consists of three parts. (1) Build high-quality sequence profiles using multiple sequence databases, including the metagenomes. From the sequence profiles, we will develop a new algorithm to predict the two-dimensional (2D) residue contact map by introducing statistical methods and deep learning algorithms. (2) Build structure profile for templates with the rich structure information in the Protein Data Bank. We will develop a new fold recognition algorithm by combing the structure profile, sequence profile, and 2D residue contact map. (3) Build accurate tertiary structure models with the I-TASSER’s Monte Carlo simulations, which will be guided by the predicted 2D residue contact map and the obtained structure templates.

蛋白质三级结构预测的目标是据蛋白质的氨基酸序列信息，构建它的三级结构。对于那些缺少同源模板的蛋白质（称为hard targets），现有预测算法的准确率很低。近十年来，实验技术的不断发展，导致蛋白质的序列和结构数量都在飞速增长。本项目将充分利用现有的大量蛋白质序列和结构信息，开发基于数据驱动的蛋白质三级结构预测算法，解决hard targets的结构预测问题。本项目的研究内容包括以下三部分：（1）利用包括宏基因组在内的多个序列数据库，构建高质量的序列谱；基于序列谱，通过统计方法及深度学习算法，开发一个准确的二维残基接触图预测算法。（2）利用蛋白质结构数据库中的大量结构信息，构建模板的结构谱；结合结构谱、序列谱及二维残基接触图，通过矩阵比对及动态规划方法，开发一个新的折叠识别算法。（3）用所预测的二维残基接触图及获取的结构模板指导I-TASSER的蒙特卡洛模拟过程，建立准确的三级结构模型。

项目摘要

对于那些缺少同源模板的蛋白质（称为hard targets），现有预测算法的准确率很低。本项目拟利用现有的大量蛋白质序列和结构信息，开发基于数据驱动的蛋白质三级结构预测算法，提升hard targets结构预测精度。经过4年研究，按计划完成了相应研究内容，取得了重要研究成果，完成了项目提出的研究目标，主要包括：（1）利用约70亿条来自基因组测序的蛋白质序列大数据，基于深度残差网络，开发了蛋白质残基接触图预测算法MapPred。（2）利用预测的残基接触图，开发了蛋白质折叠识别算法CATHER。（3）利用深度学习和最优化方法，开发了trRosetta, trRosettaX, trRosettaX-Single等多个原创性蛋白质结构预测算法。基于trRosetta系列算法整合后的Yang-Server在第15届国际蛋白质结构预测竞赛(CASP15)中获得第一名。..通过本项目的研究，在PNAS、Nature Protocols、Advanced Science、Nature Computational Science、Bioinformatics（8篇）等权威期刊发表论文19篇。在本项目的支持下，项目主持人杨建益获国家杰出青年科学基金，相应研究成果获天津市自然科学一等奖（项目主持人杨建益为第一完成人）；已培养6名硕士、4名博士。

项目成果

DOI：{{i.doi}}

发表时间：{{i.publish_year}}

暂无此项成果

数据更新时间：2023-05-31

其他相关文献

DOI：

发表时间：2021

DOI：10.13197/j.eeev.2019.05.95.fuwq.009

发表时间：2019

DOI：

发表时间：2020

DOI：10.6041/j.issn.1000-1298.2022.07.022

发表时间：2022

DOI：10.3778/j.issn.1002-8331.1903-0411

发表时间：2020

杨建益的其他基金

批准号：11501306

批准年份：2015

资助金额：18.00

项目类别：青年科学基金项目

相似国自然基金

基于随机图模型的蛋白质三级结构预测算法研究

批准号：30800168

批准年份：2008

负责人：卜东波

学科分类：C0504

资助金额：20.00

项目类别：青年科学基金项目

蛋白质三级结构从头预测研究

批准号：30240016

批准年份：2002

负责人：王玉宏

学科分类：C0505

资助金额：20.00

项目类别：专项基金项目

蛋白质残基间相互作用预测算法研究及其在三级结构预测中的应用

批准号：31770775

批准年份：2017

负责人：卜东波

学科分类：C0504

资助金额：60.00

项目类别：面上项目

基于多肽链骨架原子角度构形的蛋白质三级结构预测

批准号：10347145

批准年份：2003

负责人：刘鑫

学科分类：A25

资助金额：2.00

项目类别：专项基金项目

基于数据驱动的蛋白质三级结构预测算法研究

{{i.achievement_title}}

暂无此项成果

其他相关文献

基于铁路客流分配的旅客列车开行方案调整方法

基于被动变阻尼装置高层结构风振控制效果对比分析

基于多色集合理论的医院异常工作流处理建模

基于改进LinkNet的寒旱区遥感图像河流识别方法

新型树启发式搜索算法的机器人路径规划

杨建益的其他基金

基于结构与序列信息的蛋白质-配体结合位点的预测

相似国自然基金