As an essential regulatory mechanism in cells, epigenetic regulation of gene expression has been studies for decades. Recently, the emerging of ChIP-seq technique has greatly accelerated this researching progress. Accurately prediction of DNA regulatory elements based on high-thoughput epigenetic ChIP-seq data has become an urgent need in the field of epigenetic regulation research.Existing methods such as CSI-ANN and ChromaGenSVM only focused on peak density, ignoring the shape parameters, which are also very important for element recognition. Starting from basic statistical assumptions, we will map the statistics of probability distribution of random variables to sequencing reads distribution features around ChIP-seq peaks, aiming to construct a comprehensive characteristic description system. Using statistics within the system as training features, we plan to construct accurate DNA regulatory element prediction method based on several machine learning algorithms, and then assess the performances of our method as well as some state-of-the-art methods using new predicting data sets. Further more, we will also develop a novel method that could link DNA elements to targeting genes using multi cell ChIP-seq and RNA-seq data. In conclusion, we anticipate that this project will not only be helpful to the prediction of DNA elements using large-sample data, such as data sets in ENCODE project, and it could also provide very helpful information for regulatory mechanisms studies of individual genes.
表观遗传调控作为细胞内一种重要的基因调控机制一直以来都受到研究人员的重视,而近年来ChIP-seq技术的逐步成熟则加速了表观遗传调控研究的进程。如何利用ChIP-seq数据准确预测DNA调控元件及其靶基因已成为表观遗传调控领域亟需解决的重要问题。已有的研究方法如CSI-ANN、ChromaGenSVM等仅关注表观遗传修饰的信号强度,忽视了信号的形状分布,造成预测准确性不高,迫切需要预测准确度高的新方法。本项目从统计学的基本假设出发,将随机变量概率分布统计量映射到信号峰的形状特征上,构建信号峰形状定量描述体系;并基于该体系利用机器学习方法对DNA调控元件进行预测及准确性评估。进一步,我们将利用多细胞系数据建立调控元件与基因之间的关联,明确其生物学功能。本项目的实施将有助于在ENCODE等大数据中准确地预测全基因组的DNA调控元件,同时也可以对单个基因的表达调控机制和功能研究提供重要参考。
表观遗传调控作为细胞内一种重要的基因调控机制一直以来都受到研究人员的重视,而近年来ChIP-seq技术的逐步成熟则加速了表观遗传调控研究的进程。如何利用ChIP-seq数据准确预测DNA调控元件及其靶基因已成为表观遗传调控领域亟需解决的重要问题。已有的研究方法如CSI-ANN、ChromaGenSVM等仅关注表观遗传修饰的信号强度,忽视了信号的形状分布,造成预测准确性不高,迫切需要预测准确度高的新方法。本项目从统计学的基本假设出发,将随机变量概率分布统计量映射到信号峰的形状特征上,构建信号峰形状定量描述体系;并基于该体系利用机器学习方法对DNA调控元件进行预测及准确性评估。进一步,我们将利用多细胞系数据建立调控元件与基因之间的关联,明确其生物学功能。本项目的实施将有助于在ENCODE等大数据中准确地预测全基因组的DNA调控元件,同时也可以对单个基因的表达调控机制和功能研究提供重要参考。
{{i.achievement_title}}
数据更新时间:2023-05-31
基于分形L系统的水稻根系建模方法研究
基于SSVEP 直接脑控机器人方向和速度研究
基于公众情感倾向的主题公园评价研究——以哈尔滨市伏尔加庄园为例
基于协同表示的图嵌入鉴别分析在人脸识别中的应用
An improved extraction method reveals varied DNA content in different parts of the shells of Pacific oysters
基于复合极值分布理论的重大灾害多要素联合概率预测及风险分析
分形与序列复杂度方法在DNA调控元件预测中的应用
基于高效预测模型的原核精细调控元件理性设计
基于Copula理论的库岸边坡变形预测模型及概率失稳判据