Although the 3D architecture of the genome is important for many cellular functions, currently the field of 3D genomics is only in early stages, and much of the gene regulatory role of chromatin and 3D genome organization remains elusive. Recently, high-throughput DNA sequencing techniques have enabled one to map the landscape of genomes in high-throughput genome-wide manner, including corresponding epigenomic marks and function. On the other hand, these descriptive experimental techniques only provide indirect measurements of individual factors and are of low resolution varying from 5-40Kb. ..To obtain a full view on the epigenomic control of gene expression, we need to develop efficient bioinformatics tools to analyze high-throughput sequencing data and construct accurate prediction models to infer the missing connections between the sequence, structure and function of DNA. This proposal aims to achieve this goal and develop effective computational models to interpret current available genomic and epigenomic data involving the 3D genome structure, understand the mechanisms of the 3D folding of a genome, and identify the functional effects of the dysfunction of chromatin structure in the pathogenesis of human diseases...We will first develop an accurate deep learning framework to integrate both DNA sequence and shape information to predict the binding specificities of the chromatin architectural protein CCCTC-binding factor (CTCF). Based on the predicted binding preference of CTCF, we will study its regulatory role in maintaining the proper 3D genome structure. Next, we will develop a Bayesian inference framework to integrate Hi-C data with our prior knowledge about chromatin polymer physics and previous available super resolution imaging data to accurately model 3D genome structure. The modeled 3D architecture of the genome will provide useful mechanistic insights about the underlying functional roles of chromatin structure in gene regulation. In addition, it will enable one to recover missing long-range genomic interactions that cannot be captured by the original experimental data. We will also develop a high-resolution deep learning approach to predict distal interactions between genomic loci from both DNA sequence and epigenomic profiles, and decipher the regulatory code of 3D genome structure. This will shed light on the interplay between DNA sequence, epigenetic features, 3D chromatin structure, and its function. All these proposed studies will significantly contribute to our understanding of the epigenomic regulation of 3D genome structure and its control of gene activity.
基因组三维结构对细胞的生命活动有着非常重要的影响,尤其与基因的表达及调控密切相关。本项目通过机器学习与高通量测序技术相结合,建立有效的建模和预测模型,对基因组三维结构及其表观遗传数据进行研究,从而了解基因组三维结构水平的折叠机制,并探讨染色质结构异常在人类疾病发病过程中的作用。 首先,为了研究染色质结构蛋白CTCF在维持基因组三维结构中的作用,提出一个可整合DNA序列及几何结构信息的深度学习框架,筛选出CTCF的特异DNA结合位点。其次,为了更加直观地理解染色质三维结构的折叠机制,提出一个可整合Hi-C数据和其它辅助信息的基于贝叶斯推理模型的基因组三维结构建模方法。此外,为了研究DNA序列、表观遗传特征及基因组三维结构之间的联系,提出先进的深度学习模型对基因组三维结构中远程相互作用以及基因组绝缘子进行预测。本项目提出的建模和预测方法将有助于深入理解基因组的三维折叠机理及其潜在的表观遗传调控
随着高通量测序技术的发展以及对基因组在多尺度上相互作用的机制理解的深入,三维基因组学逐渐成为了理解生命系统作用机制及启发药物开发的重要生物学分支。三维基因组学的研究通常伴随对大规模数据的建模和分析,而本项目在该领域的多个热点问题上提出了创新的方法。首先,为了对DNA序列作用于三维基因组的机制进行建模,我们以转录速率这一三维基因组结构的重要影响因子为切入点,提出了基于注意力机制和卷积神经网络的预测模型。其次,为了系统地整合三维基因组学的高通量测序数据以获得对生命机制的理解,我们提出了对测序数据建模的图神经网络模型。此外,为了有效地预测三维基因组中的远程相互作用,我们提出了基于beta-变分自编码器的基因调控网络模型和基于半监督神经网络的合成致死预测模型。本项目有关三维基因组学的研究对理解基因组作用机制、指导药物重定位和药物开发有着重要的意义。
{{i.achievement_title}}
数据更新时间:2023-05-31
基于公众情感倾向的主题公园评价研究——以哈尔滨市伏尔加庄园为例
基于协同表示的图嵌入鉴别分析在人脸识别中的应用
一种改进的多目标正余弦优化算法
面向工件表面缺陷的无监督域适应方法
采用深度学习的铣刀磨损状态预测模型
基于语义折叠的步态行为类脑感知计算模型研究
发展核磁共振与计算模拟结合的新方法解析小肽解折叠构象
视频解析编码的随机图感知统计计算模型的研究
大电网概率风险评估的解析计算模型和算法研究