Mediation analysis has received an increasing attention in the past two decades or so due to its ability of assessing both direct and indirect effects of an exposure variable on an outcome of interest. Specially, the indirect effect is determined through mediators. Most of existing literature has focused on statistical methods in the mediation analysis with one or multiple mediators. There is little work known related to high-dimensional mediators, which become a methodological challenge from many practical fields, including social sciences, medical science, and environment-health sciences, among others. For example, in our motivating data, nutritional scientists are interested in understanding how the macronutrient intake affects child’s body mass index directly or indirectly through about 500 metabolites (or mediators). In the example of Chinese medicine, hundreds of drug-related components from different plants are used in our prescription, in which it is essential to assess direct and indirect effects of environmental pollutants on treatment outcomes. However, the lack of systematic theories, methods and algorithms to perform needed mediation analyses to answer various scientific questions. Motivated from a dataset from the ELEMENT cohort study from University of Michigan, the goal of this project is to develop a full package of statistical methodology to fill in such gap. I will establish statistical theory, methods and numerical algorithms that enable to determine relative contributions of different causal paths to the overall effect of an exposure variable in that the indirect path involves high-dimensional mediators. This project plans to develop a class of flexible semiparametric structural equation models with high-dimensional mediators, together with new statistical methods that enjoy model interpretability, estimation efficiency, and the fast computability. The theoretical work will focus on estimation consistency, asymptotic normality, and estimation efficiency for both the infinite dimensional nonparametric function and parameters of finite dimension. The proposed statistical models and methods will be applied to the ELEMENT data analysis, and a user-friendly R package will be delivered as a software product of this project.
近几年媒介分析因能通过路径分析进行因果推断而得到越来越多的关注。特别,媒介分析通过分析协变量对结果变量的直接效应以及协变量通过媒介变量对结果变量产生的间接效应,帮助研究者挖掘协变量对结果变量的内在影响机理,为后续科学问题提供基础。现有的文献主要集中于参数化低维媒介变量的因果分析,有很大的限制。非参或半参高维媒介变量的因果分析由于面临着模型的可识别性、高维导致的估计有效性以及方法、计算可行性等方面的挑战,这方面的工作还很少。本项目主要围绕密西根大学ELEMENT实验中的儿童生长发育数据展开,拟发展一类针对高维媒介变量和不同类型结果变量的灵活的半参数高维媒介模型,提出估计方法和算法,探讨非参数及参数估计的理论性质,期望在高维媒介变量方面发展一套具备数据适应性、估计有效、模型可解释以及计算可行的新的模型和估计方法,以解决高维媒介分析中的理论和应用问题。
媒介分析通过分析协变量对结果变量的直接效应以及对协变量通过媒介变量对结果变量产生的间接效应,帮助研究者挖掘协变量对结果变量的内在影响机理,为后续科学问题提供基础。现有的文献主要集中于低维媒介变量的分析或高维媒介变量单一维度分析,不符合实际数据的高维和相关特点。本项目主要研究非参/半参高维媒介变量的灵活的半参数高维媒介模型,包括:a. 对一系列非参/半参模型发展了有效的估计方法。我们针对半参/非参模型发展了基于似然函数准则的估计方法,理论上证明提出的非参数函数和参数估计量半参数有效。并且,我们提出了快速可行的算法,极大地提高了方法的适用性。b. 针对分布式数据,提出使用置信分布、经验似然等工具,来合并不同地方存储的数据信息,克服个体数据无法使用的难题;针对流数据,根据目标设置合适的历史数据统计量,充分提取历史信息,克服历史数据无记忆的难题。理论上证明了所得到的估计量和利用完整数据以分布等价。c. 针对高维线性回归模型,提出了一种新的热胀冷缩方法实现模型选择后参数的同时推断。理论和大量的模拟表明我们的方法比现有的选择后推断方法更稳定,提供更可靠的推断结果。d. 针对数据的异质问题,提出新的亚组分析方法,识别不同治疗效果的亚组成员和对各亚组的治疗效果进行统计推断。我们结合ADMM算法的快速性和EM算法的稳定性和可解释性,提出了一种新的带监督的聚类方法,HOSA。理论和模拟表明,我们的方法能够成功识别不同的亚组效应和统计推断。进一步,我们提出新的中心化的聚类方法,CAR,克服了传统离散型治疗亚组无法准确识别的难题。e. 针对多个响应变量和高维媒介变量,发展了新的半参数多层学习模型、混合高维媒介变量模型和高维媒介变量同时推断方法。我们将提出的方法分别用于分析中国健康养老数据和密西根大学ELEMENT实验中的儿童生长发育数据, 识别了疾病以及代谢化合物的不同分组情况,对老年人养老和儿童生长发育提供理论指导。
{{i.achievement_title}}
数据更新时间:2023-05-31
演化经济地理学视角下的产业结构演替与分叉研究评述
玉米叶向值的全基因组关联分析
基于一维TiO2纳米管阵列薄膜的β伏特效应研究
粗颗粒土的静止土压力系数非线性分析与计算方法
正交异性钢桥面板纵肋-面板疲劳开裂的CFRP加固研究
高维半参数模型的核机器学习方法及应用
半参数/非参数回归模型的变量选择
高维参数和半参数模型下的似然推断
超高维半参数回归模型的结构识别和变量选择问题研究