Cancer poses a serious threat to human health. The Pan-Cancer project launched by The Cancer Genome Atlas (TCGA) Research Network has assembled coherent, consistent TCGA data sets across tumor types, as well as across platforms, which provides new opportunities for the identification of biomarkers across tumor types. Due to the complexity of TCGA data, traditional data mining methods cannot analyze them effectively and efficiently. In this study, we will analyze in depth the characteristics of heterogeneous pan-cancer data, and design differential analysis methods to identify biomarkers by integrating various types of omics data. The main contents are as follows: (1) in order to consider the hierarchical relationship between node and edge biomarkers, we will develop a multi-task differential analysis model that can identify biomarkers from different levels simultaneously; (2) considering the heterogeneity of omics data, we will design a multi-view differential analysis model that can detect biomarkers from different omics data; (3) based on the above two models and group lasso, we will perform differential analysis across tumor types and identify biomarkers that are common to multiple tumor types. Unlike traditional methods that focus on detecting biomarkers from single level, single view, or single tumor type, in this project, we will detect biomarkers from multiple levels, multiple omics data and multiple tumor types. This study is important for revealing the pathogenesis of cancer.
癌症是严重威胁人类生命健康的重大疾病。由癌症基因组图谱项目的科学家发起的泛癌症项目整合了不同癌症的多种组学数据,为系统地识别不同癌症的生物标志物提供了可能。由于泛癌症组学数据的多源异构性,传统的数据分析方法面临巨大挑战。本项目拟在分析泛癌症组学数据具体特征的基础上,致力于研究能充分融合多类型数据的差异分析方法,用于癌症相关生物标志物识别。项目将重点研究以下内容:(1)针对生物标志物之间的层次关联,提出能同时从多个层次识别生物标志物的多任务差异分析模型;(2)针对组学数据的异构性,提出能从多种组学视角识别生物标志物的多视角差异分析模型;(3)基于上述两种模型和群组正则化,同时分析多种癌症,识别不同癌症共有与特有的生物标志物。不同于传统单层次、单视角、单癌症的生物标志物检测方法,本项目拟从多个层次、多种组学数据、多种癌症类型三个方面探索癌症相关生物标志物。该研究对揭示癌症的发病机制有重要意义。
基于泛癌症组学数据,从生物分子网络扰动的视角识别不同癌症的生物标志物有助于系统地理解不同癌症的发病机制。针对泛癌症组学数据的多源异构性和小样本高维度的特点,本项目主要研究了基于图模型的多任务差异分析模型构建、基于多组学数据融合的多视角差异分析模型构建和跨癌症类型的生物标志物识别三方面课题。项目开展取得了一系列成果,包括:(1)提出了基于节点的多任务差异网络分析模型,在识别网络标志物的同时挖掘导致网络差异的驱动节点,进而找出了与癌症发生发展相关的点标志物和边标志物;(2)提出了基于异构数据融合的多视角差异网络分析模型,整合多种组学数据进行差异网络联合推断,从多种组学视角识别生物标志物;(3)针对跨癌症类型的生物标志物识别,提出了基于图模型和结构化稀疏的差异网络联合推断模型,识别不同癌症共有与特有的生物标志物。项目按照计划进度正常执行,执行期间项目组成员在IEEE Transactions on Cybernetics、Bioinformatics等重要期刊和国际会议上发表学术论文16篇,其中SCI论文15篇,中科院一区论文6篇,EI论文1篇,项目负责人一作或者通讯作者论文12篇,超过预期目标。项目取得的研究成果有助于提高对癌症发病机制的认识,可以为癌症的诊断和治疗提供参考。
{{i.achievement_title}}
数据更新时间:2023-05-31
玉米叶向值的全基因组关联分析
论大数据环境对情报学发展的影响
DeoR家族转录因子PsrB调控黏质沙雷氏菌合成灵菌红素
一种光、电驱动的生物炭/硬脂酸复合相变材料的制备及其性能
跨社交网络用户对齐技术综述
基于多组学数据整合的癌症驱动突变识别
基于多维度组学数据的肿瘤标志物识别与挖掘方法研究
泛癌症异常DNA甲基化标志物识别及其调控机制研究
整合多维组学数据识别多种癌症的驱动长非编码RNA