Intepreting genetic variance (including SNP and structural variant) is the key to precision health. When the genomes of many people are sequenced, millions of sites in the genome differ among those people. Some of these genetic variants are common, such as the variants for blood types (A, B, AB, or O), while many of the variants are rare, seen in only a few people. Most of these variants will affect disease risk, response to drugs or other traits such as height in a tissue or condition specific way. How can we figure out which variants affect the function and regulation of genes in which condition? We propose to use gene regulatory network to integrating omics data and interpret genetic variants. We study the gene regulatory network with the genetic variant affecting chromatin state and TF binding affinity. To express a gene in a specific coding region, the chromatin first opens up and forms the DNA loop by interacting enhancers and promoters. Furthermore, the mediator and cohesion complexes, sequence-specific TFs, and RNA polymerase II (pol II) are recruited and work together to elaborately regulate the expression level. It’s in pressing need to understand how the genetic variant information is embedded into chromatin level and gene regulatory elements. Here, we will develop new computational methods, interpretive frameworks, and integrative models that will enable the accurate interpretation of regulatory variants Particularly, we will discuss the models and algorithms to organize, analyze, model, and integrate the genetic variant, DNA accessibility data, transcriptional data, and functional genomic regions together. We believe that the integrative paradigm on chromatin and expression levels will eventually help us to understand the information flow in cell and will influence research directions across many fields.
本申请面向健康中国和个性化精准医疗的重大战略需求,拟研究数学模型和算法,集成多层次组学数据,构建基因调控网络,阐明遗传变异与复杂性状和疾病相关的机理。计划融合最优化、动力系统、统计观点,提出DNA的遗传变异是在多层次动态分子网络的驱动下,经动力系统演化形成了复杂表型的创新性构思,对基因组、转录组、表观组等多层次组学数据,数学上用基因调控网络来进行建模。聚焦位于98%的非编码调控区域的遗传变异(单核苷酸多态SNP和结构变异),研究这些变异之间的关联网络,这些变异如何影响调控元件状态、转录因子结合强度,进一步影响基因表达,导致特定表型。特别要建模基因调控的组织、条件特异性,能定量刻画基因表达的启动或停止,增强或抑制。核心是发展网络模型和数据集的方法论,表征系统机理,提取数据特征,在贝叶斯的框架下构造量化模型和数据拟合程度的似然函数,来作为目标函数,设计最优化算法迭代求解。
全基因组关联研究识别出大量与人类复杂表型统计关联的单核苷酸多态性 (SNP)位点。对这些SNP的系统解读是发现生物机制和实现个性化医疗的关键。本项目整合多个细胞类型、组织和器官的多组学数据,建模基因调控网络,揭示SNP因果调控作用、基因组的结构和生物系统进化机理。..研究工作按原计划顺利完成。取得重要结果包括(1)发展了构建基因调控网络解析遗传变异的新方法论框架vPECA。核心思想是重建遗传变异蕴藏信息和发挥功能的特定场景下的调控网络,通过具有调控活性的元件来揭示遗传变异与下游基因和上游转录因子的因果关系。首先确定遗传变异富集的细胞类型或者特定的时间空间场景;测定该场景下的表观组-转录组等多组学数据;构建数学模型整合多组学数据,推断以调控元件为核心的基因调控网络模型;基于网络结构,系统解读遗传变异并分析其调控机制。(2)发展了人类复杂表型与细胞类型关联推断的新方法SpecVar。基于多组学数据数学建模调控网络,提取细胞类型特异的调控元件,以此为基础对复杂表型开展遗传富集分析,揭示复杂表型的相关细胞类型及其遗传变异的核心调控网络。(3)对高原低氧环境适应的分子机制这一进化和遗传领域的核心科学问题,利用vPECA模型集成多组学数据,系统解析藏族人群适应高原低氧环境的分子调控机制。揭示了EPAS1的基因表达由受选择和不受选择两类调控元件组合调控;位于增强子区域的功能位点通过削弱所在区域的染色质开放程度,进而下调EPAS1的表达,避免藏族人群在高原低氧环境下红细胞的过度增殖。(4)对人脸面部特征相关遗传变异和调控机理这一遗传学核心科学问题,构建一致性最优化模型,整合了多样本的多组学数据,重建了人类调控网络hReg-CNCC,为颅面部特征等复杂表型研究提供了宝贵资源,也强调了颅面部特征的遗传变异的解读需要放在胚胎早期发育的场景中进行。..共发表Nature Communications, eLife, Genome Research, Communications等21篇论文,编制数据分析软件10项,培养博士生2名,获得后续资助2项,包括1项国家杰出青年基金。
{{i.achievement_title}}
数据更新时间:2023-05-31
基于分形L系统的水稻根系建模方法研究
论大数据环境对情报学发展的影响
DeoR家族转录因子PsrB调控黏质沙雷氏菌合成灵菌红素
一种光、电驱动的生物炭/硬脂酸复合相变材料的制备及其性能
跨社交网络用户对齐技术综述
肝星状细胞NLRP3/caspase-1信号通路持续活化在慢性和传播阻断后血吸虫病致病中的作用机制
数据─网络:场建模与仿真
集成染色质状态和表达数据的基因调控网络建模
基于数据的基因调控网络动态模糊建模及动态特性研究
基于异质多组学数据集成的基因调控网络建模方法研究