The diseases such as cancers are in close relation with alternative splicings in the process of transcriptions, thus the identification of types, specificity and the expression levels of transcripts is especially important to mechanism research and clinical diagnosis of complex diseases. High-throughput RNA-seq technology has provided an unprecedented opportunity to reveal the very complex structures of a vertebrate’s transcriptomic landscape. However, it represents a highly challenging task to accurately and efficiently assemble the huge amount of short RNA-seq reads into transcriptome with alternative splicings and estimate their expression abundances. To solve the highly challenging problem, this project will develop new transcriptome assembly algorithms based on reference genome. To achieve this, two novel graph models (pairing graph and generalized junction graph) will be introduced to accurately describe the alternative splicing junctions and make full use of the paired-end and sequencing depth information, which would greatly overcome the deficiency of current assembly methods. Then based on the techniques of graph theory, the assembly problem will be modeled as combinatorial optimization problems and solved by searching for the optimal path cover of the generalized junction graphs using the newly designed graph combing strategy. Finally, the new algorithm will be applied to the cancer RNA-seq data to search for specific genes closely related cancers and deeply reveal the occurrences and evolution mechanisms of cancers.
癌症等复杂疾病与转录过程中的可变剪接密切相关,因此认知转录产物的种类、特异性及表达量对于复杂疾病机理的研究及临床诊断具有重要意义。高通量RNA-seq测序技术为揭示和研究真核生物转录组的复杂结构提供了前所未有的机遇,然而如何从海量测序片段准确高效的重构出全长转录组并估计出其表达量成为目前面临的一个重大挑战。为解决上述关键计算难题,本项目将开发全新的基于参考基因组的转录组拼接算法,为了克服现有拼接算法的若干不足,我们引入两种全新的图模型(双端图和广义节点图)来实现对可变剪接的精确刻画以及对双端测序信息和测序深度信息的深度挖掘与充分利用。之后,结合图论技术将问题模型化为组合优化问题,并采用新开发的梳图算法来寻找广义节点图中的最优路覆盖完成整个转录组的拼接工作。最后,将算法应用于癌症RNA-seq数据,筛选与癌症紧密相关的特定基因,期望有助于深入的揭示癌症的发生、发展规律和进化机理。
研究表明癌症等复杂疾病与转录过程中的可变剪接密切相关,因此认知转录产物的种类、特异性及表达量对于复杂疾病机理的研究及临床诊断具有重要意义。高通量RNA-seq测序技术为揭示和研究真核生物转录组的复杂结构提供了前所未有的机遇,然而如何从海量测序片段准确高效的重构出全长转录组并估计出其表达量成为目前面临的一个重大挑战。近几年,Nature系列期刊上连续刊出数篇有关基于RNA-Seq数据计算预测可变剪接转录组的科技文章和软件,使得可变剪接转录组的计算预测成为国际生物信息学研究领域最具挑战的研究课题之一。为解决上述关键计算难题,本项目开发出全新的拼接算法,并引入两种全新的图模型来实现可变剪接的精确刻画以及双端测序信息和测序深度信息的深度挖掘与充分利用,从而克服现有拼接算法的若干不足。之后,结合图论技术将问题模型化为组合优化问题,并采用新开发的梳图算法来寻找广义节点图中的最优路覆盖完成整个转录组的拼接工作。
{{i.achievement_title}}
数据更新时间:2023-05-31
DeoR家族转录因子PsrB调控黏质沙雷氏菌合成灵菌红素
转录组与代谢联合解析红花槭叶片中青素苷变化机制
基于协同表示的图嵌入鉴别分析在人脸识别中的应用
Loss of a Centrosomal Protein,Centlein, Promotes Cell Cycle Progression
Complete loss of RNA editing from the plastid genome and most highly expressed mitochondrial genes of Welwitschia mirabilis
基于de bruijn graph梳理的宏基因组拼接算法开发
癌症基因组突变谱的特征分析及其在预测癌症驱动性基因上的应用
基于新一代测序数据的全基因组拼接组装算法研究
基于“无拼接组装、无参考基因组”的PB量级组学大数据高速“反向检索”新方法及其高移植性、高可视化平台在基因型分型的应用