With advances and applications of next-generation sequencing technologies, there is increasing evidence that shows rare variants also play very important roles in many complex diseases and disorders. Developing appropriate statistical association approaches especially for rare variants has become an active research topic recently. Currently the burden test and sequence kernel association test (SKAT) are two types of methods that are commonly used for rare variants detection. However, the burden test has too strong assumption that all the rare variants are in the same direction of effects, accordingly is underpowered when both protective and deleterious effects are present. For SKAT there is lack of metrics to measure the contribution of rare variants to diseases, thus it is limited in practice. Motivated by these shortcomings, we propose the method of kernel machine learning based on likelihood ratio test from a perspective of mixed effects model. By assuming the effects of rare variants are random and using the representer theorem and point of view of Bayesian statistics, the association studies of rare variants are transformed into the variance component test for random effects. By doing this it will effectively overcome the issue of effect directionality encountered by the burden test. The likelihood ratio test and restricted likelihood ratio test will be constructed under the framework of kernel machine learning. Suitable kernel functions will be utilized to investigate the complicated nonlinear relationship between the rare variants and disease, and the variance component will be estimated and employed to measure the effects of rare variants. Additionally, we will integrate rare and common variants as well as their interaction to further improve the statistical power of association analysis. The proposed methods will be evaluated by extensive simulations and real life sequencing data, and compared to the burden test and SKAT. This study will develop novel and powerful statistical methods for the association analyses of rare variants and provide analytical tools and insights to understand better the genetic basis of diseases and the causes of missing heritability.
随着下一代测序技术的发展和应用,越来越多的证据表明罕见变异与复杂疾病密切相关,发展合适的罕见变异关联性分析方法成为现阶段研究热点。负荷检验和SKAT是现有的两种主要方法;然而,前者假设条件过强统计效能低,后者无法量化罕见变异与疾病的效应。本研究借助混合效应模型,提出在核机器学习框架下进行罕见变异关联性分析的设想,通过representer定理和贝叶斯观点,将关联性分析转化为随机效应方差成分的假设检验,并导出似然比和限制性似然比核机器学习方法,同时完善相应统计理论;通过核函数分析罕见变异与疾病之间的复杂关系,利用方差成分量化罕见变异效应。在此基础上,还将整合常见和罕见变异及其交互作用,以提高复杂疾病遗传关联分析的统计功效。本项目将通过实际数据和数值模拟评价新提出的方法,并与负荷检验和SKAT对比。本研究将发展高效新颖的罕见变异关联性分析方法,为深入理解疾病遗传基础和解释遗传缺失提供统计工具。
全基因组关联性研究主要侧重于常见变异(最小等位基因频率大于1%的遗传位点),取得了一批重要成果。然而,对许多复杂疾病而言,已发现的常见变异仅能解释极少一部分遗传度,没能取得预期的突破性进展。越来越多的证据表明,除常见变异外罕见变异同样在复杂疾病的发生发展过程中起着重要的作用,被认为是遗传缺失的重要原因。发现罕见关联位点对进一步深入认识复杂疾病的遗传风险、解释遗传缺失以及发展新的诊断技术和治疗方法具有重要意义。随着下一代测序技术的发展和应用,科学家们已经能够在全基因组或全外显子水平上精确检测低频的遗传位点,使得罕见变异关联性分析成为可能。发展高效灵敏的罕见变异关联性分析方法不但有利于进一步设计更加有效的GWAS研究,也是下一代基因测序工作的必然要求和当前统计遗传学面临的迫切任务之一。针对目前罕见变异关联性研究面临的统计学问题,本项目借助混合效应模型,提出在核机器学习框架下进行罕见变异关联性分析的设想,并导出似然比和限制性似然比检验方法。似然比或限制性似然比方法是得分函数之外的另一种统计检验理论,与负荷检验和SKAT相比,似然比检验具有以下优势:同时估计H0和H1条件下的模型,更加充分利用数据信息,能够进一步提高统计效能;能够量化罕见变异与疾病之间的效应,给出度量罕见变异相对重要性的客观指标,增强实际应用性。为了提高似然比检验的计算速度,我们发展了快速的近似算法;此外,基于混合效应模型,我们还研究了一组cis-SNP对基因表达的关联性分析和预测。本项目从理论研究、数值模拟、方法比较和实例分析等方面系统探索新提出的方法,解决了现有罕见变异关联性分析方法假设条件过强、统计效能不高和实际应用受限等问题,同时也完善似然比核机器学习统计推断理论框架,为GWAS数据和测序数据提供实用、高效和灵敏的关联性分析工具。本项目丰富和发展关联性分析理论,也对进一步深入理解疾病遗传基础和解释遗传缺失具有重要意义。
{{i.achievement_title}}
数据更新时间:2023-05-31
论大数据环境对情报学发展的影响
基于SSVEP 直接脑控机器人方向和速度研究
小跨高比钢板- 混凝土组合连梁抗剪承载力计算方法研究
卫生系统韧性研究概况及其展望
基于公众情感倾向的主题公园评价研究——以哈尔滨市伏尔加庄园为例
整合常见和罕见变异进行肺癌风险预测的统计方法研究
全外显子组IgA肾病低频及罕见遗传变异研究
基于下一代测序的数量性状-罕见变异关联研究中大数据的统计分析
MKK7基因编码区罕见遗传变异对人群肺癌发病和预后的影响