基于宏基因组测序的病毒株序列重建与识别方法

基本信息

批准号：61503314

项目类别：青年科学基金项目

资助金额：22.00

负责人：曾丰

学科分类：

依托单位：厦门大学

批准年份：2015

结题年份：2018

起止时间：2016-01-01 - 2018-12-31

项目状态：已结题

项目参与者：王颖,吴小惠,李磊,袁明顺,关锦婷,王云田,龙之瀛

关键词：

组装测序错误序列比对建模新一代测序数据处理平台建设

结项摘要

Viruses are the important participants in the physiology and disease, and are also the key elements in the biogeochemical cycles. Metagenomic sequencing facilitates the large-scale profiling of virus-like particles that inhabit human body and environment, and then flourishes the knowledge of the role of viruses in human health and ecology. Strain-level inspection of viruses in a community profoundly contributes to the understanding of the viral selection, adaptation and fitness in response to the exterior stimuli, e.g. the immunity. However, the lacking of Bioinformatics tools hinders the strain-level analysis. Viral metagenomic sequencing data mixes the fragments of highly similar strains, and is inevitably contaminated by sequencing errors. Thus, it is a computational challenge to recover the strain sequences from the error-prone fragmentized mixture data. To address this issue, we propose to develop the computational method that combines the hierarchical assembly and spectral clustering to reconstruct the strain sequences following the local-then-global paradigm. We plan to implement the above method, and freely offer the software to the academic community. Along with the strain reconstruction method, we plan to develop the strain-based variant calling method for viral metagenomic sequencing data. It is to first reconstruct the strain sequences, and then to compare strains so as to detect single nucleotide polymorphisms (SNPs), insertions and deletions. As a counterpart of the haplotype-based variant calling in human genome sequencing, the strain-based variant calling method could deploy error model and re-alignment to suppress the interference of sequencing errors and incorrect alignments on the detection of genetic variants on virome, and then improve the detection ability of rare variants. We also plan to implement the above method, and freely offer the software to the academic community. Finally, we will apply the strain reconstruction software and strain-based variant calling software on the marine viral metagenomic sequencing data, and then facilitate the exploration of the virome-enviroment connection.

使用宏基因组测序技术，大规模测序人体与环境中的病毒，解析群体中病毒株的基因组序列以及群落结构，对于病毒进化动力学研究、病原体检测、复杂疾病病理研究、以及环境生态研究等具有重要的意义和应用价值。病毒宏基因组测序数据混合了不同的病毒株基因组序列。这些序列相似程度高，经过超声技术被震断成序列片段，并且受到测序错误的污染，所以在病毒宏基因组数据中重建病毒株的基因组序列是一项极具挑战的计算任务。因此，在这项研究中，我们将致力于研究使用级联拼接与谱聚类等技术从低信噪比、碎片化、混合度高的数据中重建病毒株序列以及群落结构。在此基础上，我们将进一步研究基于病毒株序列的遗传变异检测方法，使用错误模型和序列重比对等技术，降低测序错误和比对错误对于病毒基因组遗传变异检测的影响。本项目的预期研究成果将转化为应用软件，应用在深海宏基因组与海洋变化的关联研究中,并将为菌株水平的宏基因组研究提供有益的借鉴和技术支持。

项目摘要

宏基因组被称为人体的“第二基因组”，不但与糖尿病、结肠癌等疾病具有密不可分的联系，也会影响放疗和免疫治疗对癌症的治疗效果。因此，研究宏基因组分析的有效方法有助于理解宏基因组的群落结构的变化规律，使宏基因组可以在精准医学中发挥重要的作用。.宏基因组测序是研究宏基因组的群落结构的一种高通量技术。通常，宏基因组测序数据中包含了成百上千种微生物和病毒的DNA序列。数据分析的困难主要有三点，首先是异质性程度高，数据中混合的微生物和病毒的数目事先未知，其次是近源物种的DNA序列具有很高的相似度，第三是测序错误率的影响。因此，一般的聚类方法很难能够准确地估计出群落中微生物和病毒的准确数目以及组成比例。.项目组对基于宏基因组测序数据的群落结构的重建方法展开了研究，取得了一定成果。首先，课题提出了一种重建宏基因组的群落结构的计算方法，通过组装全长的16S rRNA序列对物种进行鉴别和丰度估计，方法的准确性和特异性均领先于国际同行的相关工作。该方法在算法设计和数据结构方法均有创新。第一，提出了一种基于系统发生树的数据粗聚类的技术。利用系统发生树，将进化关系较近的数据聚合在一起，将进化关系较远的数据分开，大大降低了数据的异质性。第二，提出了序列比对图的数据结构，用于表示具有高相似度的生物序列。序列比对图的骨干表示一致性序列，而图上的分支表示SNP和InDel。序列比对图有效的表示了宏基因组数据。第三，方法提出了一种面向序列比对图的非参数贝叶斯分析方法，准确估计混合模型的组成成分的数目和比重。其次，课题针对16S扩增子测序数据提出了一种新的OTU估计方法，利用测序错误模型提高OTU估计的准确性，并且提高了结果的可重复性。第三，课题使用上述方法分析了土壤微生物中的“暗物质”，以及肠道微生物和2型糖尿病的相关性。.课题提出的计算方法和数据结构将有助于复杂异质性数据的分析，同时课题开发的相关计算工具有助于研究宏基因组的群落结构和环境、疾病等的关联作用。

项目成果

DOI：{{i.doi}}

发表时间：{{i.publish_year}}

暂无此项成果

数据更新时间：2023-05-31

其他相关文献

DOI：10.7538/hhx.2022.yx.2021092

发表时间：2022

DOI：10.13199/j.cnki.cst.2020.07.010

发表时间：2020

DOI：10.11963/1002-7807.wjjfsl.20190515

发表时间：2019

DOI：10.11766/trxb201908050402

发表时间：2021

DOI：10.3969/j.issn.1004-132x.2022.04.001

发表时间：2022

曾丰的其他基金

相似国自然基金

基于线粒体基因组序列与形态的中国眼蕈蚊科分类体系重建

批准号：31372244

批准年份：2013

负责人：吴鸿

学科分类：C0402

资助金额：82.00

项目类别：面上项目

基于三代测序校正序列的基因组结构变异检测方法研究

批准号：31701146

批准年份：2017

负责人：陈颖

学科分类：C0608

资助金额：24.00

项目类别：青年科学基金项目

基于高通量测序数据多供体植物基因组结构变异识别方法研究

批准号：61402132

批准年份：2014

负责人：王春宇

学科分类：F0213

资助金额：24.00

项目类别：青年科学基金项目

基于宏基因组测序数据的微生物基因组序列鉴定及群落比较方法研究

批准号：11701546

批准年份：2017

负责人：宋凯

学科分类：A0604

资助金额：24.00

项目类别：青年科学基金项目

基于宏基因组测序的病毒株序列重建与识别方法

{{i.achievement_title}}

暂无此项成果

其他相关文献

萃取过程中微观到宏观的多尺度超分子组装 --离子液体的特异性功能

智能煤矿建设路线与工程实践

陆地棉无绒突变体miRNA的鉴定及其靶标基因分析

不同类型水稻土微生物群落结构特征及其影响因素

船用低速机关键摩擦副建模分析与摩擦力无线测量验证

曾丰的其他基金

相似国自然基金