The complex diseases such as cancer are in close relation to alternative splicing in the process of transcription, thus the identification of types, specificity and the expression level of transcription products is especially important to cancer mechanism research and clinical diagnosis. High-throughput RNA-seq technology has provided an unprecedented opportunity to reveal the very complex structures of a eukaryote’s transcriptomic landscape. However, it represents a highly challenging task to accurately and efficiently assemble the huge amount of short RNA-seq reads into transcriptome with alternative transcripts. To solve these key computing problems described above, our project will develop a new assembly technology, which combines the interval graph model and converts the assembly problems into combinatorial optimization problems. Focusing on the characteristics of high throughput sequencing data and the algorithmic obstacles caused by alternative splicing, we will develop a new algorithm to greatly increase the accuracy, reduce the time and space complexity, thus overcoming the deficiency of current assembly algorithms. The new algorithm will be applied to predict transcriptomes for RNA-seq data from specific cancer tissues. The difference analysis on transcriptomes, combining with traditional microarray study, will benefit the prediction of oncogenes. In addition, the existed cancer related signaling pathways and metabolic pathways will be involved to analyze the differential expression at transcription level, aiming at exploring the essential mechanism of cancer development.
研究表明癌症等复杂疾病与转录过程中的可变剪接密切相关,因此认知转录产物的种类、特异性及表达量对于癌症机理研究及临床诊断具有重要意义。高通量的RNA-seq测序技术为揭示和研究真核生物转录组的复杂结构提供了前所未有的机遇。然而如何准确有效地将海量测序片段组装成完整的转录组成为目前面临的一个重要挑战。本项目针对基于RNA-seq的转录组拼接问题,利用图论技术将问题模型化,进而将转录组拼接归结为经典的组合最优化问题;通过系统研究相关理论问题,针对海量数据的特征、可变剪接带来的障碍,设计高效准确的算法,解决拼接问题的计算瓶颈;在准确预测转录组的基础上,将算法应用于癌症相关的RNA-seq数据,结合传统的基于基因表达芯片的研究,筛选与特定癌症紧密相关的特定基因,并利用信号传导通路和代谢通路信息进行综合分析,寻找致病基因在转录体水平上差异表达的原因,更加深入的揭示癌症的发生规律和进化机理。
研究表明癌症等复杂疾病与转录过程中的可变剪接密切相关,因此认知转录产物的种类、特异性及表达量对于癌症机理研究及临床诊断具有重要意义。高通量的RNA-seq测序技术为揭示和研究真核生物转录组的复杂结构提供了前所未有的机遇。然而如何准确有效地将海量测序片段组装成完整的转录组成为目前面临的一个重要挑战。本项目针对基于RNA-seq的转录组拼接问题,利用图论技术将问题模型化,进而将转录组拼接归结为经典的组合最优化问题;通过系统研究相关理论问题,针对海量数据的特征、可变剪接带来的障碍,设计高效准确的算法,解决拼接问题的计算瓶颈;在准确预测转录组的基础上,将算法应用于癌症相关的 RNA-seq 数据,结合传统的基于基因表达芯片的研究,筛选与特定癌症紧密相关的特定基因,并利用信号传导通路和代谢通路信息进行综合分析,寻找致病基因在转录体水平上差异表达的原因,更加深入的揭示癌症的发生规律和进化机理。本项目突破了几个经典的转录组拼接算法的计算瓶颈,发表高质量的学术论文70余篇,并研发了相应的算法软件和网络服务平台。
{{i.achievement_title}}
数据更新时间:2023-05-31
DeoR家族转录因子PsrB调控黏质沙雷氏菌合成灵菌红素
小跨高比钢板- 混凝土组合连梁抗剪承载力计算方法研究
转录组与代谢联合解析红花槭叶片中青素苷变化机制
卫生系统韧性研究概况及其展望
惯性约束聚变内爆中基于多块结构网格的高效辐射扩散并行算法
基于参考基因组的转录组拼接算法研究及其在癌症中的应用
仅基于RNA-Seq数据拼装可变剪接转录组的计算方法研究
基于新一代测序数据的全基因组拼接组装算法研究
基于双向聚类算法的高通量组学数据融合方法研究