A metagenomics sample refers to all the genomes of a microbial community in the specific environment (such as human intestinal, etc.). It's crucial to understand how environment and genes interact to affect human health. High throughput sequencing technologies make it possible to study the organisms that occur in these systems and to infer the biochemical and regulatory pathways that are present. But current approaches to analyze the massive quantity of available data are limited in their ability to find patterns and relationships among the organisms and pathways. The objective of this research is to develop novel computational and statistical methods and relevant software packages for the comparison of microbial communities based on short sequence reads. Such sequence-based comparisons of microbial communities provide a powerful tool to understand highly complex communities. The proposal involves the following studies: (1) to develop the probability distribution theory for the number of occurrences of word patterns in metagenomic communities consisting of mixture of genomes, (2) to develop novel efficient statistics for the comparisons of metagenomic communities using word patterns, and (3) to integrate current different approaches for the comparison of metagenomic communities using both tag sequences and word patterns. In addition, we will also develop publicly-available software packages for all relevant researchers.
宏基因组指特定环境 (如人类肠道等)中微生物群落所有物种的基因组,深入研究这些微生物群落的构成情况、相互作用及其变化情况,对于分析环境与基因相互作用对人类健康的影响具有重要的意义。高通量测序技术使得人们研究这些群落内的有机体、推断它们的生化和调控通路成为了可能。本课题的研究目标是发展基于序列特征的统计计算方法用于微生物群落比较,并提供相应的软件包,为人们更好地理解高度复杂的微生物群落提供有效的工具。本项目将研究宏基因组学的以下重要问题:1)发展由多个基因组构成的宏基因组群落中字符模式出现次数的概率分布理论;2)基于字符模式提出新的有效的用于宏基因组群落比较的非序列比对统计量;3)将标签序列和字符模式结合起来,提出宏基因组比较综合方法。此外,我们将得到的相关宏基因组比较理论成果转化为分析算法软件供研究者无偿使用。
本课题的研究总体上是按照项目计划书所列的内容进行的, 完成了项目申请书中的基本研究内容和基本目标,取得了预期的研究目标。对于目前普遍采用的非序列比对的相似性度量进行了有价值的修正;在DNA序列可视化研究方面有了新的进展;在DNA序列中识别蛋白质编码区域方面有了一定的研究结果。到目前为止,本课题组已发表的学术期刊(SCI收入)论文5篇,此外还有投稿3篇。
{{i.achievement_title}}
数据更新时间:2023-05-31
DeoR家族转录因子PsrB调控黏质沙雷氏菌合成灵菌红素
监管的非对称性、盈余管理模式选择与证监会执法效率?
农超对接模式中利益分配问题研究
黄河流域水资源利用时空演变特征及驱动要素
宁南山区植被恢复模式对土壤主要酶活性、微生物多样性及土壤养分的影响
序列模式识别统计功效的分析研究
基于文献数据的科研兴趣转移行为模式及统计规律的研究
基于全基因组概括关联统计量的统计建模与推断
关于有序数据的统计推断