Data-based machine learning method is the most important intelligent technology of the age of Big Data. Kernel-based learning methods, such as the SVM, and artificial neural network-based methods like Deep Learning are the two kinds of typical nonlinear machine learning models. All of them can be classified as the Representation Learning. Feature Sparsity and Sample Sparsity are the key techniques of training the Representation Learning models with large-scale, high dimensional data. The linear case of those methods is fully studied. This project focuses on the study of the nonlinear learning case of the Feature Sparsity and Sample Sparsity. Combining the new progress of this field, based on the necessary and sufficient optimality conditions, and the primal-dual relationship of the optimization model, we will design the efficient security screening method to eliminate irrelevant features or/and samples to explore the feature/sample sparsity structure of the problem, and put forward some the efficient learning algorithms for training the large scale problems based on some techniques, such as the low rank approximation of kernel matrix etc. We will also embed the convex kernel-based learning model into the complex deep network and reveal its sparse structure to enhance the interpretability of deep learning while maintaining its high learning performance. Furthermore, we will propose layering hierarchical, layering composited or deep composited kernel function to express the priori information of the problem to enhance the representation ability of the kernel-based model and sacrifice its convexity partly, then combine with the nonconvex optimization method to improve the performance of kernel learning models. The project is innovative and advanced in its area, and the research results will greatly promote the widely applications of the nonlinear sparsity learning models.
基于数据的机器学习是大数据时代应用最广泛的智能技术,以SVM为代表的核学习和以深度学习为代表的人工神经网络是两类典型的非线性机器学习模型,都可归类为表示学习。特征稀疏性和样本稀疏性是基于表示学习模型处理大规模、高维训练问题的关键技术,线性情形下研究较为成熟。本项目结合该领域国内外的新进展,在非线性学习情形下,基于相关优化模型的最优性条件、原对偶关系等,设计高效的安全筛查方法剔除无关特征和样本,发掘学习问题的特征/样本稀疏性结构,并结合核矩阵的低秩近似等方法,提出快速的大规模学习算法;将凸的核学习模型嵌入复杂深度学习网络,揭示其稀疏性结构,在保持学习性能的同时增强深度网络的可解释性;设计分层递阶、分层复合或深度复合的核函数来表达先验信息,增强核学习模型的表达能力并牺牲一定的凸性,结合非凸优化算法提高核学习算法的性能。项目具的创新性和前沿性,其研究成果将大大推动非线性稀疏模型更为广泛的应用。
本项目在国家自然科学基金的资助下,严格按照项目立项书的研究计划,对机器学习中的的特征稀疏性和样本稀疏性的表示学习模型进行系统深入的研究。在很多主流机器学习领域,如大规模最小二乘支持向量机算法、多标签学习、稀疏鲁棒学习模型、非凸损失函数的学习模型、2DPCA降维、FCM聚类、随机梯度预条件算法、不平衡数据学习、多视觉学习等,取得了一系列的研究成果。在项目资助期间,课题组发表与项目相关论文27篇,其中SCI收录15篇,EI收录20篇,其余发表在国家核心期刊上。获得授权专利2项。培养博士研究生10人、硕士研究生17人,其中博士毕业3人、硕士毕业10人;博士毕业生1人博士论文获陕西省优秀博士论文。超额完成资助计划书预期目标。
{{i.achievement_title}}
数据更新时间:2023-05-31
玉米叶向值的全基因组关联分析
监管的非对称性、盈余管理模式选择与证监会执法效率?
硬件木马:关键问题研究进展及新动向
宁南山区植被恢复模式对土壤主要酶活性、微生物多样性及土壤养分的影响
内点最大化与冗余点控制的小型无人机遥感图像配准
基于核函数的正则化学习算法:逼近性及稀疏性研究
基于有限Radon特征和判别稀疏字典学习的行人检测算法研究
基于稀疏表示和深度学习的大规模目标检测
样本自适应的多核学习算法研究