The rapid development of next-generation sequencing technique has significantly accelerated the research progress of tumorigenesis. It also accumulated a vast number of genomic and transcriptomic data. By far, the pan-cancer analysis based on single regulators, genes as well as miRNA-gene regulations has been widely studied. However, there is still a lack of pan-cancer analysis from the point of view of miRNA regulatory modules. In this project, we will first use the publicly available expression data from TCGA database to construct the corresponding miRNA regulatory networks, and then systematically develop algorithms to detect pan-cancer miRNA regulatory modules. The development of the detection algorithms mainly consists of the following two aspects: 1) by introducing the concept of uncertain graphs, we could establish a statistical model for multiple cancers. We will then propose an edge-based frequent subgraph mining algorithm to detect the pan-cancer miRNA regulatory modules according to the constructed model; 2) by considering the shared information among multiple cancers, we could construct an integrated cancer network by using an empirical Bayes model. We will then propose a multi-objective genetic algorithm to identify large and densely connected miRNA regulatory modules. The detected modules will be validated by a variety of biological data such as the gene ontology data or clinical data, which will eventually reveal the regulatory pathways that are related to multiple cancers. The accomplishment of this project will provide new insights into the pan-cancer analysis in terms of next-generation sequencing data, and it will also provide new evidences for finding potential drug targets for cancer treatment.
新一代测序技术的发展极大地加速了肿瘤致病机理的研究进程,并积累了大量的基因组学、转录组学等数据。目前基于单因子和miRNA调控作用的泛癌分析已经展开,但仍然缺乏从调控模块角度出发的相关研究。本项目利用TCGA数据库中公开发布的表达谱数据构建相应的miRNA调控网络,并在此基础上系统地设计了泛癌miRNA模块发现算法。算法的设计主要按照以下两个思路展开:1)引入不确定图的概念,构建相应的多癌症网络概率模型,并提出基于边扩展的频繁子图挖掘算法识别泛癌的miRNA调控模块;2)考虑不同类型网络间共享信息,利用经验贝叶斯模型构建整合癌症网络,并提出基于多目标的遗传算法挖掘整合网络中具有较大规模且稠密的miRNA调控模块。通过利用已有的基因功能注释或临床诊断数据等信息对挖掘出的泛癌调控模块进行验证,以期发现与多个癌症发生发展相关的重要调控通路。本项目的完成将为基于新一代测序数据的泛癌分析提供新的思路
随着TCGA数据库中各种肿瘤样本测序数据的积累和完善,使得以计算建模为主导的肿瘤生物信息学研究的新模式得以广泛开展。本项目利用以TCGA数据库中提供的mRNA、miRNA及lncRNA表达谱数据为基础,构建相应的生物分子调控网络,并提出多个算法来挖掘网络中重要的调控模块及识别具有潜在预后价值的生物标志物。项目首先从TCGA数据库中获取了十种癌症的表达谱数据,并对比分析了多种网络构建方法的优缺点。随后,设计了一种基于信息度量的特征选择方法用于在表达谱数据中提取重要的生物分子。为了挖掘网络中重要的调控模块,提出一种基于正交非负矩阵因式分解的识别算法,并将经实验验证的调控作用作为先验信息加入目标函数中。此外,提出一种基于结构最优图的聚类算法,利用稀疏编码对原始数据进行去噪,并通过添加秩约束得到具有良好结构的相似性矩阵。为了识别对于疾病发生发展的重要生物标志物,提出一种基于矩阵补全和标签传播的半监督模型来预测与疾病相关的miRNA。通过矩阵补全方法学习到更优的相似性矩阵并在此基础上利用标签传播算法对结果进行有效的预测。最后,提出一种自适应的多视图多标签学习算法,有效整合多源数据及基于图正则化的多标签学习模型对关联关系进行了预测,并利用肿瘤分级信息及生存分析验证了所识别生物标志物潜在的预后价值。本项目的完成为探索癌症相关的潜在调控模块及生物标志物提供了新的思路和方法。
{{i.achievement_title}}
数据更新时间:2023-05-31
玉米叶向值的全基因组关联分析
论大数据环境对情报学发展的影响
正交异性钢桥面板纵肋-面板疲劳开裂的CFRP加固研究
硬件木马:关键问题研究进展及新动向
基于SSVEP 直接脑控机器人方向和速度研究
基于新一代肿瘤测序数据的驱动通路发现与综合分析方法研究
基于新一代测序数据的全基因组拼接组装算法研究
基于新一代测序数据识别影响重症肌无力药物反应的miRNA多态及机制研究
基于新一代测序数据的顺式调控模体预测与分析