融合基因结构特性的多位点关联分析的统计学方法研究

基本信息
批准号:11571082
项目类别:面上项目
资助金额:45.00
负责人:胡跃清
学科分类:
依托单位:复旦大学
批准年份:2015
结题年份:2019
起止时间:2016-01-01 - 2019-12-31
项目状态: 已结题
项目参与者:林诗丽,孙雷鸣,王婵,李黎明,马欣宇
关键词:
家系数据关联分析罕见变异关联分析基因定位连锁不平衡
结项摘要

Benefitted from the Human Genome Project, International Haplotype Project, and 1000 Genomes Project, the genome-wide association studies have identified hundreds of genetic variants which are associated with the human complex diseases. However, these variations explain only 5-10% of the disease burden in the population. Most existing methods for gene mapping do not utilize the structural features of the genes, such as the linkage disequilibrium, the recombination fraction, the clustering of SNPs, whether the SNPs is in coding or non-coding region, in regulatory or non-regulatory region, and so on. These features would inevitably affect the power for detecting causal SNPs. For example, the strong linkage disequilibrium could inflate the type one error rate of some test statistics. This project focuses on the description of structural features and investigates their effects on the testing methods. We then incorporate these features into the statistics to improve the power of detecting association between variants and diseases. For case-control study, we describe the respective distributions of cites harboring variants in a region, and then evaluate the difference between two distributions by using relative entropy and compute the significance from the permutation procedure. So we can detect the SNPs associated with complex diseases. For each case-parents trio, we construct a pseudo offspring based on the two haplotypes of parents which are not transmitted to affected child. As in the case-control study, we compare the distribution of cites for affected children with that for pseudo offspring, and then detect associating SNPs. In addition, we could evaluate the difference of genetic variants of multiple sites simultaneously between cases and controls, or affected children and pseudo offspring. We then obtain the score vector and derive its covariance matrix. The structural features of genes are measured and reflected in the distributions and score vector via weights. The linkage disequilibrium and recombination fraction are present in the covariance matrix. For the general categorical trait or even ordinal trait, the mutual information is employed to detect the associated SNPs. Due to the existence of many non-causal variants, especially for rare variants, we further detect the causal variants by using variable selection method. We build the classification function based on the likelihood ratio and determine the threshold through cross validation. Then we make the prediction and prognosis of complex disease based on the genetic information. Finally, the developed methods are applied to analyze seven common diseases real data and detect corresponding associated SNPs. Therefore we could understand more about genetic mechanism and provide statistical support for the biological experiment. Furthermore, we will issue the free packages to help scientific community to solve their problems.

关联分析发现了许多复杂性疾病的遗传变异位点,但不能完全解释疾病多样性。现有方法很少利用连锁不平衡、重组、位点归属等基因结构特性,可能错失重要遗传信号。本项目刻画结构特性,研究它们对检验统计量的影响。对核心家庭数据,基于父母中未传递的等位基因构建病孩的虚拟对照,用相对熵比较发生变异位点的分布或多位点变异数分布的不同,或类似于病例对照组,利用多位点基因型向量构造得分统计量;利用互信息来衡量遗传因素和分类性状值之间的依赖关系。位点属性通过加权反映在分布或得分统计量中,连锁不平衡和重组的信息体现在协方差矩阵中,用置换过程评价显著性,高效定位基因。进一步,用变量选择法排查中性变异,用似然比构建分类函数,由交叉检验得阈值,从而对疾病进行预测和预后。大量随机模拟验证检验统计量的正确性和高效性,发放免费软件包,及分析七种常见疾病数据集,寻找易感基因,为湿实验验证提供统计学依据。

项目摘要

在基因型和表型的关联研究中,本项目针对病例对照数据、病孩-双亲三人组数据、分类性状值,按照常见变异和稀有变异的混合结构,信号和噪音共存结构,作用方向相反的变异互存情形,变异之间的连锁不平衡成区块结构等多种场合,构建了相应的检验关联变异的统计量。在有DNA变异和DNA甲基化数据的场合,巧妙地把变异的间接效应从总效应中分离出,给出一个偏倚更小的估计量。这些方法在随机模拟中均有上佳表现,并在多个实例数据的分析中得出不少有意义的发现。利用孟德尔随机化方法,借助基因工具变量,推断同型半胱氨酸与肥胖或高血压的发生风险的因果关系。同时,还研究了六个血压相关的单核苷酸多态性以及它们的交互作用与中国儿童肥胖的关联性。相应的统计分析方法形成了软件包,供实际工作者免费下载和使用。

项目成果
{{index+1}}

{{i.achievement_title}}

{{i.achievement_title}}

DOI:{{i.doi}}
发表时间:{{i.publish_year}}

暂无此项成果

数据更新时间:2023-05-31

其他相关文献

1

玉米叶向值的全基因组关联分析

玉米叶向值的全基因组关联分析

DOI:
发表时间:
2

论大数据环境对情报学发展的影响

论大数据环境对情报学发展的影响

DOI:
发表时间:2017
3

DeoR家族转录因子PsrB调控黏质沙雷氏菌合成灵菌红素

DeoR家族转录因子PsrB调控黏质沙雷氏菌合成灵菌红素

DOI:10.3969/j.issn.1673-1689.2021.10.004
发表时间:2021
4

正交异性钢桥面板纵肋-面板疲劳开裂的CFRP加固研究

正交异性钢桥面板纵肋-面板疲劳开裂的CFRP加固研究

DOI:10.19713/j.cnki.43-1423/u.t20201185
发表时间:2021
5

硬件木马:关键问题研究进展及新动向

硬件木马:关键问题研究进展及新动向

DOI:
发表时间:2018

胡跃清的其他基金

相似国自然基金

1

动态性状的快速高效多位点全基因组关联分析新方法研究

批准号:31701309
批准年份:2017
负责人:吕海燕
学科分类:C1301
资助金额:23.00
项目类别:青年科学基金项目
2

海量SNP、高精度和快速检测的多位点关联分析新方法及其应用研究

批准号:31571268
批准年份:2015
负责人:章元明
学科分类:C0606
资助金额:63.00
项目类别:面上项目
3

高通量快速检测含有杂合基因型关联群体的多位点GWAS方法学研究及其软件包研制

批准号:31871242
批准年份:2018
负责人:章元明
学科分类:C0606
资助金额:59.00
项目类别:面上项目
4

基于不同种慢生根瘤菌的多位点基因序列的芯片分析

批准号:31000002
批准年份:2010
负责人:谷峻
学科分类:C0101
资助金额:22.00
项目类别:青年科学基金项目