This proposal mainly investigates asymptotic theory and applications of high dimensional variable selection methods for estimating causal effect under Neyman-Rubin potential outcomes framework. The aim is to improve the causal effect estimation efficiency and to provide principled ways for investigators to analyze randomized experiments when the number of covariates is large. Randomized controlled trials (randomized experiments) are widely used to measure the efficacy of treatments. Often, baseline covariate information is collected about each unit in the experiments. Investigators often use regression adjustment to analyze randomized experiments instead of simply reporting the difference of means between treatment and control groups. The aim of regression adjustment is to reduce the variance of the estimated causal effect. In large scale randomized experiments, the number of covariates can be very large, even much larger than the number of observations. For example, in clinical trials, demographic and genetic information may be recorded about each patient. However, in this "big data" setting, many of the covariates may be irrelevant to the outcome being studied. Hence, variable selection or some form of regularization is necessary for effective causal effect estimation. In this proposal, we will focus on three problems in large scale randomized experiments: (1) asymptotic normality of Elastic Net adjusted Average Causal Effect (ACE) estimates; (2) asymptotic properties of Lasso adjusted ACE estimates in randomized experiments with noncompliance; and (3) how to estimate heterogeneous treatment effect by using Lasso adjusted method. Moreover, we will apply the proposed methods to analyze real world large scale randomized experiments and provide insights for practitioners.
本项目主要研究高维变量选择方法在Neyman-Rubin潜在结果模型中估计因果效应时的理论和应用。研究目的是利用高维变量选择方法提高因果效应的估计精度并提供分析高维随机试验的原则和方法。随机试验被广泛地应用于研究处理的因果效应。随机试验中,研究者通常收集每个个体的解释变量的信息,并利用这些信息通过回归调整的方法分析试验的结果,以减小因果效应估计的方差。在大规模随机试验中,能够观测到的解释变量很多,甚至超过样本的个数。然而,很多解释变量与我们感兴趣的因变量无关,因此需要进行变量选择或者一定形式的正则化。本项目将重点研究大规模随机试验中的以下三个问题:平均因果效应的Elastic Net调整估计的渐近理论;随机试验中存在不顺从者时,顺从者平均因果效应的Lasso调整估计的渐近性质;如何利用Lasso调整的方法估计异质性因果效应。我们将应用上述方法和理论分析随机试验并指导实践活动。
随着大数据的兴起、科学研究对因果性关系的持续重视、以及高维统计等机器学习方法的不断发展和完善,大数据因果推断成为全新并且非常有前途的研究领域。大数据因果推断主要研究如何利用高维变量选择、深度学习等机器学习方法从高维数据中推断因果关系。本项目主要研究随机对照试验中平均因果效应的高维回归调整估计、随机对照试验中存在不顺从试验对象时顺从者的平均因果效应的估计以及高维随机对照试验中异质性因果效应的估计。本项目的重要结果包括(1)在随机化试验的Neyman-Rubin潜在结果模型下,建立了高维随机对照试验中平均因果效应的Elastic Net,Adaptive Lasso、Ridge和MCP调整估计的渐近正态性,证明了其渐近方差不大于均值差估计的渐近方差,给出了渐近方差的保守估计,用于建立保守的置信区间; (2)建立了随机区组试验中平均因果效应估计的有限总体中心极限定理,提出了两种回归调整方法,证明了其相合性和渐近正态性,并证明了这两种方法比分层均值差估计更加有效;(3)探讨了利用机器学习和神经网络估计异质性因果效应的方法,讨论了估计异质性因果效应时,神经网络方法的优缺点,并且设计了一种全新的神经网络用于估计异质性因果效应;(4)提出了bootstrap Lasso + Partial Ridge的方法来建立高维稀疏线性回归模型中参数的置信区间,该方法可以推广到推断Neyman-Rubin因果模型中的异质性因果效应。本项目的科学意义是建立了高维变量选择、回归调整方法在Neyman-Rubin因果推断模型中的理论性质,为有效分析高维随机对照试验、精确估计因果效应提供了方法和理论的保证,在临床试验、精准医疗、精准营销、政策评估等领域有着广泛的应用。
{{i.achievement_title}}
数据更新时间:2023-05-31
基于一维TiO2纳米管阵列薄膜的β伏特效应研究
监管的非对称性、盈余管理模式选择与证监会执法效率?
粗颗粒土的静止土压力系数非线性分析与计算方法
特斯拉涡轮机运行性能研究综述
基于LASSO-SVMR模型城市生活需水量的预测
基于惩罚似然的变量选择方法及其在高维数据模型中的应用
带潜变量高维模型的统计推断
中介变量的因果推断
稳健高维变量选择方法及其在基因表达分析中的应用研究