RNA methylation is emerging to be a pervasive epigenetic mark that plays a critical role in gene regulation. The recently developed technology of methylated RNA immunoprecipitation sequencing (MeRIP-seq) allows transcriptome-wide profiling of RNA methylation. Mining the patterns of global mRNA methylation from these MeRIP-seq data can help reveal the potential functional roles of these mRNA methylations in regulating gene expression, splicing, RNA editing and RNA stability and provide leads for more effective therapeutic intervention for cancer. However, MeRIP-seq is still at its early stage and many computational issues still need to be resolved to fully unleash its power. The computational methods of DNA methylation detecting are not suitable for identifying RNA methylations.This project addresses important computational challenges in MeRIP-seq data analysis and our goal is to develop, for the first time, computational graphical models for to enable 1) accurate detection of global mRNA methylations and 2) accurate identification of differential methylation both at the gene and its isoform levels. The main thrust of this project is to leverage our combined expertise in computation modeling, bioinformatics, and high throughput sequencing analysis to develop new graphical models for MeRIP-seq to enable accurate detection of global mRNA methylation and differential methylation. This proposal is highly innovative because such models are virtually unavailable. We will fully capitalize the power of graphical models to address many important and challenging issues arising from the unique requirement of peak calling and differential methylation analysis on transcript in MeRIP-seq. The constructed new models will integrate several sophisticated models including multi-layer hidden Markov model, Negative Binomial regression, Dirichlet process of mixture models, hierarchical Bayesian model and sparse regression et al.The Gibbs sampling and other complex Bayes theory and methods will be employed to estimate the model parameters, and efficient inference and prediction algorithms will also be developed. They will contribute to the advances of computational modeling and learning. Particular efforts are also planned to develop the software and user-friendly tools to facilitate the mRNA methylation research by biologists and computational scientists.
MeRIP-seq技术能够在全转录组范围内描述RNA甲基化,从其高通量数据中挖掘全部RNA甲基化模式,有助于揭示mRNA甲基化在调控基因表达、剪切等方面所发挥的潜在功能,有效指导癌症的干预治疗。然而,MeRIP-seq数据分析计算方法面临许多计算挑战,现有DNA甲基化数据分析方法不能直接用来分析RNA甲基化数据,迫切需要发展有效的计算方法。本项目将对MeRIP-seq数据分析中所面临的一些重大计算问题进行研究,整合多层隐马模型、Dirichlet 过程混合模型、负二项回归、层次贝叶斯模型、稀疏回归等复杂模型理论方法,构建系列mRNA甲基化检测概率图模型,基于吉布斯采样等复杂贝叶斯理论估计模型参数,发展有效的推理和预测算法,在基因及异构体层次上实现: mRNA甲基化位点及甲基化状态的精确预测; mRNA差异甲基化状态的精确检测。开发mRNA甲基化可视化分析平台及软件工具,方便生物学家使用。
MeRIP-seq技术能够在全转录组范围内描述RNA甲基化,从其高通量数据中挖掘全部RNA甲基化模式,有助于揭示mRNA甲基化在调控基因表达、剪切等方面所发挥的潜在功能,有效指导癌症的干预治疗。本项目从MeRIP-seq高通量数据及其它组学数据出发,严格按照项目计划书要求开展研究,发展了系列mRNA甲基化检测算法、甲基化功能检测算法、甲基化数据质量评估软件及可视化工具,并在癌症驱动基因及风险致病基因识别、药物-靶点作用预测、基因调控网络重构、微生物基因组数据分析等方面也开展了研究工作,主要取得了以下研究成果:1、针对目前mRNA甲基化峰检测算法过分纯化数据,忽略样本之间读段差异及读段依赖性,基于图解模型提出MeTPeak算法检测m6A位点峰。2、针对目前甲基化位点检测算法只能检测区域内的甲基化、分辨率低和假阳性高情况,基于深度学习提出Deep-m6A算法,单核苷酸精度检测mRNA甲基化位点。2、针对MeRIP-seq数据小样本、高离散度等问题,分别提出FET-HMM、DRME及QNB甲基化/差异甲基化检测算法,小尺度、小样本下高精度分析mRNA差异甲基化状态。3、为研究不同细胞类型之间mRNA甲基化动态谱特性,从MeRIP-Seq数据挖掘甲基化模式,分别提出MeTCluster算法及评估聚类结果一致性策略,发现甲基化及共甲基化模式。4、针对现有MeRIP-seq数据分析孤立预测各个mRNA甲基化位点,忽略基因间功能相互作用关系及mRNA甲基化功能,提出m6A-Driver算法有效识别m6A甲基化驱动基因,构建m6A甲基化驱动基因网络。5、开发MeT-DB V2.0数据库、trumpet R软件包及Guitar可视化工具,为mRNA甲基化研究人员提供帮助和支持。6、基于网络控制理论,分别提出优化目标控制策略(TCOA)及单样本控制策略(CSC),识别癌症关键驱动基因集合。7、分别提出AFMFSC、NMCOM风险致病基因预测算法,成功预测肺癌、前列腺癌潜在致病基因。8、分别提出LPMIHN、MKLC-BiRW及DT-all药物-靶点作用预测算法,高精度预测药物-靶蛋白作用关系。9、分别提出LBN、OCMIPN基因调控网络构建算法,高精度、快速构建大规模基因调控网络。10、分别提出DBH、DMclust 微生物操作单元聚类算法,有效揭示环境中的微生物物种多样性。
{{i.achievement_title}}
数据更新时间:2023-05-31
论大数据环境对情报学发展的影响
粗颗粒土的静止土压力系数非线性分析与计算方法
内点最大化与冗余点控制的小型无人机遥感图像配准
中国参与全球价值链的环境效应分析
基于公众情感倾向的主题公园评价研究——以哈尔滨市伏尔加庄园为例
基于概率图模型的复杂行为识别
图模型和概率专家系统
基于概率图模型的图像分割方法研究
深度概率图模型的学习与推理预测