In recent years, ultrahigh dimensional data arises frequently in many scientific fields, such as biology and medical science. How to analyze ultrahigh dimensional data poses many challenges to conventional computational algorithm and statistical theory. In this project, we aim to study independent screening procedures for ultrahigh dimensional data. Unlike conventional variable selection techniques,independent screening procedures are computationally efficient, which makes them very appealing in ultrahigh dimensional data analysis. We investigate the following four issues. (1) With the sparsity principle, we design new model-free independent screening procedures for analyzing ultrahigh dimensional data; (2) borrowing the idea of double robustness in semiparametrics, we design new iterative procedures to address the issue that the existing independent screening procedures may miss some important predictors which are marginally irrelevant to the response variable; (3) we discuss how to decide the number of predictors which should be retained after the screening procedure, in order to keep all important predictors while removing as many unimportant predictors as possible; and (4) we establish some theoretical properties, including the ranking consistency property and the selection consistency property if possible, for the new model-free independent screening procedures under mild conditions. In addition, we apply newly proposed independent screening procedures to adress some important scientific questions, intending to make some interesting scientific observations.
近年来,超高维数据频繁地出现在生物及医学等诸多科学领域中。超高维数据分析对传统的计算方法和统计理论提出了新挑战。本项目研究分析超高维数据的独立筛选变量方法。与传统变量选择方法非常不同,独立筛选变量方法计算简单,因此在分析超高维数据时非常具有吸引力。本项目研究如下内容:(1)基于效应稀疏原理,构造不依赖于模型具体形式的独立变量筛选方法;(2)由于基于边际模型构造的独立筛选变量方法可能漏选部分与因变量边际独立的重要变量,我们利用迭代算法并借用半参数双稳健性构造新的独立变量筛选法来解决这一问题;(3)确定变量筛选方法保留变量的个数,尽可能保留全部的重要变量且尽可能多地剔除不重要变量;以及(4)在较弱条件下研究这些不依赖于模型的独立筛选变量法的理论性质,希望这些独立筛选变量法具有选择相合性或排序相合性。另外,我们将这些新方法应用于一些重要的科学问题,以期得到一些有意义的科学新发现。
在基金项目的支持下,项目组基于效应稀疏原理,构造了一系列不依赖于模型具体形式的独立变量筛选方法。为了尽可能保留全部的重要变量且尽可能多地剔除不重要变量,我们讨论了筛选后保留变量的数量。我们证明了这些变量筛选方法确定筛选性与排序相合性,并将新方法应用于生物等科学领域,取得了一批有影响力的学术成果。国际统计学顶级或SCI学术期刊上发表论文17篇,接受5篇。项目组培养了研究超高维数据降维的硕士研究生6人,均已顺利毕业和就业。博士研究生5人,3人在读,2人顺利毕业入职高校。博士后1名。课题组参加了7次国际会议和7次国内会议来宣传研究成果。
{{i.achievement_title}}
数据更新时间:2023-05-31
论大数据环境对情报学发展的影响
氟化铵对CoMoS /ZrO_2催化4-甲基酚加氢脱氧性能的影响
基于 Kronecker 压缩感知的宽带 MIMO 雷达高分辨三维成像
基于公众情感倾向的主题公园评价研究——以哈尔滨市伏尔加庄园为例
城市轨道交通车站火灾情况下客流疏散能力评价
超高维病例队列数据的联合变量筛选研究
删失数据超高维共线性模型的变量选择
删失数据超高维共线性模型的变量选择
超高维生存数据变量筛选和选择中若干问题的研究