Gene-environment interactions play key roles in many complex diseases. Many studies have shown the role of linear or nonlinear gene-environment interaction effects on complex disease risk. In this project, we propose semiparametric models, including partially linear varying-coefficient model and partially linear varying multi-index coefficient model, which allow us to assess how multiple environment factors act simultaneously to modify linearly and nonlinearly individual genetic risk on complex disease. They overcome interpretability issue from linear model and the problem that varying-coefficient model excludes discrete environmental factors (such as gender, smoking status). Due to the difference in convergence rates of the parametric and nonparametric parts, it has huge challenge to assess both linear and nonlinear gene-environment interaction effects simultaneously in partial linear model setup. We consider two hypothesis testing procedures to detect linear and nonlinear effects simultaneously. One is based on generalized log-likelihood ratio test, and another is based on kernel in the reproducing kernel Hilbert space. Moreover, we will study if the proposed tests are able to attain optimal detectable rates for both parametric and nonparametric components simultaneously. And we will study the variable selection based on high-dimensional data of complex diseases. Because of the complexity of gene, study of gene-environment interactions need to consider several gene levels or their combinations. Our studies will rich the statistical theory and methodology, as well as speed up the process of genetic mapping of human complex diseases.
基因与环境交互作用是复杂疾病的主要因素。研究表明复杂疾病受线性和非线性的基因与环境交互作用。本项目利用半参数模型,包括部分线性变系数模型和部分线性变多指标系数模型,研究基因同时与多个环境因子线性和非线性交互作用,并且克服线性模型难以解释、变系数模型忽略离散环境变量(如性别、抽烟状况)的不足。因参数和非参数估计的收敛率不同 ,给线性部分和非线性部分的联合检验带来很大挑战。本项目基于广义对数似然比和基于核的两种联合检验,能同时检验基因与环境因子的线性和非线性交互作用。我们将研究联合检验统计量是否能同时达到参数与非参数各自的最优收敛率。并研究高维数据下的变量选择问题。由于基因的复杂性,基因与环境因子的交互作用需要从基因的多个层次分析,本项目同时考虑基因的多个层次与环境因子的交互作用。这些研究成果将丰富统计理论方法,并有效地加快人类复杂疾病基因定位的进程。
项目组发表了标注本项目的论文6篇,包括以第一作者发表的国际顶尖级期刊论文1篇,国际顶尖级生物统计学期刊论文1篇,国际一流期刊论文2篇。本项目利用半参数模型,包括部分线性变系数模型和部分线性变多指标系数模型,研究基因同时与多个环境因子线性和非线性交互作用。由于基因数据是天然高维数据的特点,本项目组研究了高维数据的一系列统计性质,以及在基因数据中的应用。本项目的研究成果丰富了非参半参数、高维数据分析的统计方法和理论,并有效地加快人类复杂疾病基因定位的进程。
{{i.achievement_title}}
数据更新时间:2023-05-31
论大数据环境对情报学发展的影响
DeoR家族转录因子PsrB调控黏质沙雷氏菌合成灵菌红素
粗颗粒土的静止土压力系数非线性分析与计算方法
正交异性钢桥面板纵肋-面板疲劳开裂的CFRP加固研究
基于 Kronecker 压缩感知的宽带 MIMO 雷达高分辨三维成像
复杂多元数据的半参数统计推断
纵向数据的动态半参数建模及其统计推断
复杂数据半参数模型的稳健统计推断研究
复杂数据下治疗影响的半参数统计推断及其应用