Identification and analyses of cis-regulatory motifs represent an important problem in computational biology, and it provides a key piece of information for inference of transcriptional regulatory networks encoded in a cell. Here we propose a comprehensive study on accurate predication and systematic analysis of cis-regulatory motifs in human genome, based on large-scale ChIP-seq date sets in public domain. First, we will design a novel motif prediction method on the TF binding regions provided by a given ChIP-seq data set, aiming to improve both the state-of-the-art accuracy and efficiency. This method organically integrates hash table technique, graph theory and combinatorial optimization, etc. Specifically, it transforms the challenging motif length identification problem into a maximal weighted path problem on a de bruijn graph model; and it enables co-factor and discriminative motif finding, leading to a new co-factor motif module prediction function. Second, the read depth information of binding activity will be used to reduce the bias caused by the random proportion of motif segments in existing motif representing model, the purpose of which is to improve the performance of motif searching and comparing algorithms. Third, we will develop a new framework for transcriptional regulatory network construction and analysis, based on the knowledge and insights gained from the last two studies and large-scale TF binding data in public domain. Finally, an integrated software system for all the above studies will be developed and implemented on a web-based server, aiming to facilitate more researchers with limited computational background. We believe that the proposed studies will intrinsically improve the performance of ChIP-seq based motif finding and enhance the analysis and application of cis-regulatory motifs. The new insights gained and new computational technology developed in this project, will enable a large community of biology researchers to conduct a broad range of data analysis studies that are currently not feasible.
顺式调控模体的预测和分析是计算生物学中的重要问题,是研究生命体内调控机制的关键。本项目利用ChIP-seq技术提供的转录因子结合区域来准确预测和系统分析人类基因组中的顺式调控模体。项目针对人类ChIP-seq数据规模较大带来的计算困难,结合哈希表、图论和组合优化技术设计模体预测算法,兼顾了效率和准确度的提高;同时,引入德布鲁因图技术解决模体长度确定难题,并将共因子模体、区别性模体预测等重要应用功能有机的融合在算法中实现。项目利用测序短序列覆盖深度信息,改进模体表示模型,进而提高模体搜索和比对等模体分析算法的精度。项目基于大规模转录因子数据,通过模体预测与分析,探索转录调控网络构建方法并利用图模型进行调控网络模块分析,最终形成一套顺式调控模体的高效预测和系统分析的软件系统,并实现网络在线服务。本项目的完成将大幅提高调控模体的预测效果,实现调控模体的深层次分析,促进转录调控机制研究。
顺式调控模体的预测和分析是计算生物学中的重要问题,是研究生命体内调控机制的关键。该项目充分利用新一代测序数据研究人类基因组中模体预测的新算法和分析应用新方法,通过引入新的理论、技术和计算模型解决计算瓶颈和存在的问题,提高模体预测精度和效率,开发相应的软件并提供在线网络服务;进而,在此基础上综合大规模ChIP-seq以及表达数据进行调控分析,增加了相关数据的可解释性,促进转录调控领域的研究发展。项目同时研究了人类基因组以及与人类复杂疾病具有复杂关联关系的微生物组,包括在微生物的基因组结构,转录调控单元等方面。主要的成果包括设计了DESSO、CEMIG、WTSA、seqATU等系列模体预测、转录单元预测算法、lncRNA-gene调控网络预测算法,对基于ChIP-seq进行模体预测算法DESSO开发了网络服务器,发表SCI收录论文8篇,包括生物信息学顶级和主流期刊Nucleic Acids Research(IF16.971)、Briefings in Bioinformatics(IF11.622)、Bioinformatics(IF 6.937)等,毕业博士硕士研究生4名。本项目的完成将大幅提高调控模体的预测效果,实现调控模体的深层次分析,促进转录调控机制研究。
{{i.achievement_title}}
数据更新时间:2023-05-31
论大数据环境对情报学发展的影响
DeoR家族转录因子PsrB调控黏质沙雷氏菌合成灵菌红素
跨社交网络用户对齐技术综述
小跨高比钢板- 混凝土组合连梁抗剪承载力计算方法研究
转录组与代谢联合解析红花槭叶片中青素苷变化机制
真菌顺式调控模体与模块的全基因组范围计算预测
基于新一代测序数据的肿瘤纯度及倍体动态预测方法研究
基于ChIP-seq数据和系统发生信息的调控模体预测
基于新一代肿瘤测序数据的驱动通路发现与综合分析方法研究