Aberrant DNA methylation plays an important role in cancer development and progression, and is a category of promising cancer biomarkers. The existing huge amounts of DNA methylation data from Infinium HumanMethylation450 BeadChip (referred to as 450K array hereinafter) provide valuable resource for the biomarker identification. However, the 450K array only contains less than 2% of all CpG sites in the human genome, and covers very few miRNA and lncRNA genes. Previously, we and others have developed some models to expand the data, which focuses on overall expansion accuracies, and therefore cannot evaluate the prediction accuracy of single CpG locus. Besides, the studies of differential methylation in cancer focused only on the methylation pattern of coding genes. Based on the 450K array data of 13 cancers from TCGA, we will develop precision expansion models aiming at specific CpG loci, and investigate the methylation patterns of not only coding genes, but also miRNA and lnRNA genes in both cancer/normal samples. Furthermore, by combining the confidence level of the expanded data, the context of each CpG locus, we will develop a sophisticated algorithm to identify the differentially methylated genes in cancer samples. Finally, the potential cancer biomarkers will be identified by integrative analysis of multi-omics data, robust gene selection and experimental validations. Our work would offer a paradigm for others to develop precision expansion method based on lower-coverage omics data, and provide guidance for early cancer diagnosis.
DNA异常甲基化在癌症的发生与发展过程中起着重要作用,是极具潜力的癌症生物标志物。现有的大量450K甲基化芯片数据为挖掘癌症标志物提供了重要依据,但450K芯片仅覆盖了不足人类基因组2%的CpG位点,且只覆盖了极少量的miRNA与lncRNA基因。前期研究中,申请人及其他研究者提出的芯片扩展模型侧重于模型的总体准确率,无法评估单个CpG位点的预测准确性,重点关注癌症中编码基因的异常甲基化模式。本项目拟基于TCGA中13种癌症的450K芯片数据,研究面向特定CpG位点的甲基化精准扩展模型,探索癌症与对照样本中编码基因、miRNA与lncRNA非编码基因的甲基化模式,构建融合数据可信度、CpG位点局部信息等多因素的癌症异常甲基化基因识别算法,集联多组学数据融合、稳固特征选择以及生物学实验验证结果挖掘癌症标志物。本项目为其它低覆盖率组学数据的精准扩展方法研究提供范例,也为癌症的早期诊断提供依据。
DNA异常甲基化在癌症的发生与发展过程中起着重要作用,是极具潜力的癌症生物标志物。现有的大量450K甲基化芯片数据为挖掘癌症标志物提供了重要依据,但450K芯片仅覆盖了不足人类基因组2%的CpG位点,且只覆盖了极少量的miRNA与lncRNA基因。本项目基于TCGA中13种癌症的450K芯片数据,研究了面向特定CpG位点的甲基化精准扩展模型,探索了癌症与对照样本中编码基因、非编码基因的甲基化模式,构建了融合数据可信度、CpG位点局部信息等多因素的癌症异常甲基化基因识别算法,集联多组学数据融合、稳固特征选择结果挖掘了潜在癌症标志物。此外,面向单细胞DNA甲基化数据,开发了结合细胞内及细胞间关联关系的预测模型以及更为鲁棒的聚类算法。本项目为其它低覆盖率组学数据的精准扩展方法研究提供了范例,也为癌症的早期诊断、精准医疗提供了依据。
{{i.achievement_title}}
数据更新时间:2023-05-31
DeoR家族转录因子PsrB调控黏质沙雷氏菌合成灵菌红素
正交异性钢桥面板纵肋-面板疲劳开裂的CFRP加固研究
低轨卫星通信信道分配策略
山核桃赤霉素氧化酶基因CcGA3ox 的克隆和功能分析
Wnt 信号通路在非小细胞肺癌中的研究进展
基于网络模型的癌症异常DNA甲基化模块挖掘算法研究
基于DNA甲基化交互网络的癌症hallmark挖掘及其在癌症转移biomarker筛选中的应用
泛癌症异常DNA甲基化标志物识别及其调控机制研究
基于扩展模糊积分的生物信息数据挖掘研究