The purpose of this project is to develop rigorous mathematical analysis for some problems arising from distributed learning, functional data analysis and 1-bit compressed sensing by methods and ideas from approximation theory. Theory of learning with classical kernel methods such as support vector machines and kernel ridge regression has been well developed in mathematics, based on probability analysis, statistics, and approximation theory. However, due to the high computational complexity suffered from kernel matrix inversion and eigen-decomposition, how to implement kernel-based algorithms on large data sets is still challenging. Dovetailing naturally with parallel and distributed computation, distributed algorithms lead to a substantial reduction in complexity versus the standard approach of performing algorithms on the entire data set. In this project, we shall establish mathematical foundations of distributed learning and study the influence of kernel functions and data structures to the consistency of the algorithms. Functional data analysis views infinite dimensional data such as curves or images as realizations of random functions and takes into account the functional nature of the data. In this project we are interested in an RKHS approach to learning with functional data and investigating approximation abilities of linear functional spaces. Our study will deepen the understanding of the role played by the covariance operator in functional data analysis. 1-bit compressed sensing aims at recovering sparse vectors from highly quantized linear measurements. In this project a learning theory framework is introduced to analyze 1-bit compressed sensing algorithms, which would lead to error bounds associated with more general measurement vectors. Since 1-bit compressed sensing has natural binary classification data, we shall consider sparse binomial regression problem to further clarify connections between 1-bit compressed sensing and binary classification. The study of this project will enrich the mathematical theory of machine learning and shed light on new theoretical problems in mathematics, design of useful algorithms for big data.
本项目利用逼近论中的思想和方法对分布式学习,函数型数据分析和1-bit压缩感知进行深入的理论研究。核方法的学习理论经过概率统计和逼近论的研究在数学上已经发展成熟。因涉及核矩阵求逆和特征分解,如何在大规模数据集上求解仍然是一个挑战性的课题。基于并行和分布式计算的想法,分布式学习可以实质性地降低算法复杂度。本项目将建立分布式核方法的数学理论,刻画核函数和数据结构对算法相容性的影响。函数型数据分析将无穷维数据(曲线或图像)看做随机函数的实现并考虑数据的函数特性。我们将利用RKHS中的正则化方法研究函数型学习算法,考察线性泛函和协方差算子在函数数据分析中的逼近性质。1-bit压缩感知通过高度量化的线性测量值恢复稀疏向量。本项目在学习理论的框架下研究1-bit压缩感知和稀疏二项回归,建立两者和二分类问题之间的联系。本项目的研究有助于丰富机器学习的数学理论,提出新的数学问题并为设计大数据算法提供线索。
项目按计划展开工作,在学习理论的框架下对大规模数据集上的核方法,函数型数据回归以及1-bit压缩感知等数据分析问题进行深入的理论研究,取得了一系列重要的研究成果。主要研究进展包括:在半监督学习的框架下,建立了分布式多罚正则化核方法的数学理论;首次给出了非正定核方法的分布式算法和Nystrom子采样方法的理论分析;建立了再生核空间中函数型数据回归的积分算子逼近分析方法,给出了函数型数据谱算法以及基于函数型数据个性化治疗算法的理论分析;对pinball损失函数用于1-bit压缩感知问题进行了深入的理论研究,在此基础上建立了混合型1-bit压缩感知问题的算法和理论分析。在本项目研究的基础上,我们还对数据分析领域的前沿和难点问题进行研究,在稀疏稳健核方法的算法和理论研究,高维数据稀疏降维的算法和理论,以及深度学习的算法和逼近论基础等问题上取得了一些重要成果。在本项目资助下,课题组成员共发表论文27篇,其中21篇论文发表于Applied and Computational Harmonic Analysis,Annals of Statistics,Inverse Problems, Journal of Machine Learning Research,IEEE Transactions on Neural Networks and Learning Systems等应用数学,统计与机器学习领域的国际权威期刊。本项目的研究有助于丰富机器学习的数学理论,提出新的数学问题并为设计大数据算法提供线索。
{{i.achievement_title}}
数据更新时间:2023-05-31
玉米叶向值的全基因组关联分析
论大数据环境对情报学发展的影响
Intensive photocatalytic activity enhancement of Bi5O7I via coupling with band structure and content adjustable BiOBrxI1-x
正交异性钢桥面板纵肋-面板疲劳开裂的CFRP加固研究
硬件木马:关键问题研究进展及新动向
面向水下被动定位的稳健1-bit压缩感知关键算法研究
函数型数据学习理论
多元函数型数据的统计分析及其应用研究
基于字典学习及压缩感知的干涉高光谱数据压缩重建算法研究