With the development of high-performance computing and big data storage technology, ultra-high dimensional data analysis has attracted tremendous interests from both researchers and practitioners, because of its appearance in many real applications, including social and economic sciences. Variable selection aims to correctly identify the truly informative variables in the ultra-high dimensional data, overcome the difficulties encountered in classical statistical methods, and significantly improve the estimation and prediction accuracy. Thus, the objective of this research proposal is to provide a novel variable selection method by taking full use of the nice properties in reproducing kernel Hilbert space (RKHS), such as derivative reproducing property, representer theorem, as well as the kernel ridge regression in RKHS. Its key advantage is that it assumes no explicit model assumption, admits general predictor effects, allows for scalable computation, and attains desirable asymptotic theoretical results. Tighter theoretical results are provided for squared loss function with some extra operators in functional analysis, and a linear case is studied as a special case of our method to provide some better understanding both methodologically and theoretically. Furthermore, we extend the proposed method to interaction selection, which has attracted tremendous interest in recent years. The applicant’s solid theoretical foundation,rich research experience and preliminary exploratory research will lead the project into the right direction for its successful completion, which will make some substantial contributions to the research on variable selection, and provide a novel idea for efficiently solving the ultra-high dimensional data.
随着高性能计算与海量数据存储技术的发展,超高维数据越来越多的出现在社会生活以及科学研究等诸多领域,引起了研究者的广泛关注。通过变量选择,可以抓取超高维数据中对统计分析真实有用的变量,克服经典统计学方法所遇到的困难,并且可以显著地提高统计估计与预测的精确性,为更深入的统计分析奠定了基础。本项目拟利用再生核希尔伯特空间中函数所特有的性质,如导数的可再生性等,基于再生核希尔伯特空间的岭回归工具,提出一类具有良好数据与模型适应性、计算高效迅速以及有理论保证的变量选择方法;以平方损失函数为例,借助泛函分析中的算子工具给出更精确的理论结果,并以线性模型等为特例深入研究其性质;将该类方法扩展到目前热门的变量交互效应的选择中。申请者扎实的理论基础,丰富的研究积累以及前期较多的探索性工作,为本项目的顺利完成奠定坚实的基础,其最终研究成果将进一步丰富变量选择方法,为有效处理超高维数据提供一个新颖的思路。
随着高性能计算与海量数据存储技术的发展,超高维数据越来越多的出现在社会生活以及 科学研究等诸多领域,引起了研究者的广泛关注。本项目拟利用再生核希尔伯特空间中函数所特 有的性质,如导数的可再生性等,基于再生核希尔伯特空间的岭回归工具,提出一类具有良好 数据与模型适应性、计算高效迅速以及有理论保证的变量选择方法;以平方损失函数为例,借 助泛函分析中的算子工具给出更精确的理论结果,并以线性模型等为特例深入研究其性质;将 该类方法扩展到目前热门的变量交互效应的选择,网络数据分析以及有向无环图的还原估计中。申请者扎实的理论基础,丰富的研究积累 以及前期较多的探索性工作,为本项目的顺利完成奠定坚实的基础,其最终研究成果将进一步丰富相关领域的研究。
{{i.achievement_title}}
数据更新时间:2023-05-31
珠江口生物中多氯萘、六氯丁二烯和五氯苯酚的含量水平和分布特征
向日葵种质资源苗期抗旱性鉴定及抗旱指标筛选
多能耦合三相不平衡主动配电网与输电网交互随机模糊潮流方法
复杂系统科学研究进展
基于LS-SVM香梨可溶性糖的近红外光谱快速检测
变量核奇异积分算子及其相关问题
解析再生核希尔伯特空间的自适应傅里叶分解理论及其相关应用
再生核希尔伯特空间图像稀疏表达算法研究
再生核希尔伯特空间中自适应滤波新方法及应用