Multi-instance multi-label learning is a machine learning framework proposed recently for solving the problem of multi-semantic data. Because it can provide a possibility for explaining why a concerned sample has the certain class labels, multi-instance multi-label learning framework is attracting more and more attention. Gaussian process model is a kernel method that has many merits such as being implemented easily, adaptively discovering the relationship among variables. This project aims at developing a novel multi-instance multi-label learning algorithm based on Gaussian process model for solving the problem of large-scale incompletely annotated multi-semantic data. It includes research to solve the problem of simultaneously describing the relationship between instances and labels as well as the relationship among labels by designing a new Gaussian process model, to solve the large-scale training data problem by proposing an solving approach with lower computational cost for Gaussian process model based on stochastic variational inference, to solve the incompletely annotated data problem by developing a two-step strategy based on ideas of positive and unlabeled learning. Based on Gaussian process model, we not only develop a model that can simultaneously describe the relationship between instances and labels as well as the relationship among labels, which is a key problem for developing multi-instance multi-label learning algorithm, but also solve the problem that kernel methods is difficult to process large-scale training data. This project will promote the application of multi-instance multi-label learning in big data.
多示例多标记学习是近年来提出的一种处理多义性数据的新机器学习框架,由于它为挖掘样本与其类别标记间的驱动关系提供了可行性,正受到越来越多的关注。高斯过程模型是一种核方法,具有易实现、可自适应地挖掘关系信息等优点。本项目旨在基于高斯过程模型建立一种面向大规模未完全标注多义性数据的多示例多标记学习算法,拟先通过设计一种新结构的高斯过程模型,解决同时挖掘示例与标记间关系和标记与标记间关系这两种重要信息的问题;然后基于随机变分推理法建立一种复杂度较低的模型求解方法,解决处理大规模训练数据的问题;最后借助PU学习技术的思想建立一种两阶段策略,解决有效利用未完全标注数据的问题,从而达到最终目的。本项目利用高斯过程模型不仅解决了同时挖掘示例与标记间关系和标记与标记间关系这个算法构建的核心问题,还解决了核方法复杂度过高不宜处理大规模数据的问题,可有效推动多示例多标记学习技术在大数据中的应用。
随着大数据时代的到来和人工智能技术的广泛应用,各个领域和行业都把数据看作一种战略资产进行收集、存储和分析,而多义性、大规模、弱标记已成为数据的几种普遍特性。多示例多标记学习是近年来提出的一种处理多义性数据的新机器学习框架,由于它为挖掘样本与其类别标记间的驱动关系提供了可行性,正受到越来越多的关注。高斯过程模型是一种核方法,具有易实现、可自适应地挖掘关系信息等优点。本项目利用高斯过程模型对面向大规模未完全标注多义性数据的多示例多标记学习算法构建问题进行了研究,先设计了一种新结构的高斯过程模型,解决同时挖掘示例与标记间关系和标记与标记间关系这两种重要信息的问题;然后基于诱导变量策略、拉普拉斯后验概率逼近方法和稀疏嵌入技术建立了一种新的高斯过程模型求解方法,并基于此建立了一种面向大规模数据的核多示例多标记学习算法;最后借助自步学习的思想,利用权重调整策略将所建算法进一步拓展,建立了最终的面向大规模未完全标注多义性数据的多示例多标记学习算法。本项目的研究成果可用于解决多义性、大规模、弱监督数据的挖掘问题,具有重要的理论意义和实用价值。
{{i.achievement_title}}
数据更新时间:2023-05-31
肥胖型少弱精子症的发病机制及中医调体防治
基于铁路客流分配的旅客列车开行方案调整方法
针对弱边缘信息的左心室图像分割算法
基于多色集合理论的医院异常工作流处理建模
基于腔内级联变频的0.63μm波段多波长激光器
多示例多标记学习中的最优化方法及其应用
基于最大间隔的多示例学习算法设计与分析
多尺度高斯过程模型及其学习曲线研究
基于特征学习和标记关联的多标记学习算法研究