Gene-environment interactions play key roles in many complex diseases. Many studies have shown the role of linear or nonlinear gene-environment interaction effects on complex disease risk. In this project, we propose semiparametric models, including partially linear varying-coefficient model and partially linear varying multi-index coefficient model, which allow us to assess how multiple environment factors act simultaneously to modify linearly and nonlinearly individual genetic risk on complex disease. They overcome interpretability issue from linear model and the problem that varying-coefficient model excludes discrete environmental factors (such as gender, smoking status). Due to the difference in convergence rates of the parametric and nonparametric parts, it has huge challenge to assess both linear and nonlinear gene-environment interaction effects simultaneously in partial linear model setup. We consider two hypothesis testing procedures to detect linear and nonlinear effects simultaneously. One is based on generalized log-likelihood ratio test, and another is based on kernel in the reproducing kernel Hilbert space. Moreover, we will study if the proposed tests are able to attain optimal detectable rates for both parametric and nonparametric components simultaneously. And we will study the variable selection based on high-dimensional data of complex diseases. Because of the complexity of gene, study of gene-environment interactions need to consider several gene levels or their combinations. Our studies will rich the statistical theory and methodology, as well as speed up the process of genetic mapping of human complex diseases.
基因与环境交互作用是复杂疾病的主要因素。研究表明复杂疾病受线性和非线性的基因与环境交互作用。本项目利用半参数模型,包括部分线性变系数模型和部分线性变多指标系数模型,研究基因同时与多个环境因子线性和非线性交互作用,并且克服线性模型难以解释、变系数模型忽略离散环境变量(如性别、抽烟状况)的不足。因参数和非参数估计的收敛率不同 ,给线性部分和非线性部分的联合检验带来很大挑战。本项目基于广义对数似然比和基于核的两种联合检验,能同时检验基因与环境因子的线性和非线性交互作用。我们将研究联合检验统计量是否能同时达到参数与非参数各自的最优收敛率。并研究高维数据下的变量选择问题。由于基因的复杂性,基因与环境因子的交互作用需要从基因的多个层次分析,本项目同时考虑基因的多个层次与环境因子的交互作用。这些研究成果将丰富统计理论方法,并有效地加快人类复杂疾病基因定位的进程。
项目组发表了标注本项目的论文6篇,包括以第一作者发表的国际顶尖级期刊论文1篇,国际顶尖级生物统计学期刊论文1篇,国际一流期刊论文2篇。本项目利用半参数模型,包括部分线性变系数模型和部分线性变多指标系数模型,研究基因同时与多个环境因子线性和非线性交互作用。由于基因数据是天然高维数据的特点,本项目组研究了高维数据的一系列统计性质,以及在基因数据中的应用。本项目的研究成果丰富了非参半参数、高维数据分析的统计方法和理论,并有效地加快人类复杂疾病基因定位的进程。
{{i.achievement_title}}
数据更新时间:2023-05-31
一种基于多层设计空间缩减策略的近似高维优化方法
复杂系统科学研究进展
神经退行性疾病发病机制的研究进展
智能煤矿建设路线与工程实践
长链基因间非编码RNA 00681竞争性结合miR-16促进黑素瘤细胞侵袭和迁移
复杂多元数据的半参数统计推断
纵向数据的动态半参数建模及其统计推断
复杂数据半参数模型的稳健统计推断研究
复杂数据下治疗影响的半参数统计推断及其应用