Accurate and reliable labeled samples are the basis for supervised learning. However, input noise or output noise may exist in real dataset due to insufficiency of information, subjectivity in labeling or encoding errors. There will be modeling deviation which weakens the generalization ability of the model. Considering that input noise has a lower impact on modeling, the project focuses on the problem of output noise in supervised learning, and conducts research on adaptive noise filter applicable for both classification and regression. The main research contents are as follows: (1) the effectiveness of traditional noise filters is analyzed quantitatively; (2) statistical analysis of correlation between the error and noise; (3) an adaptive noise filtering method is explored; (4) the noise filtering algorithm is designed for supervised learning. The research results of this project will further enrich and develop the theory and method of noise filtering in supervised learning. They will provide guidance for selecting noise filters and solving real problems. Furthermore, it could improve the reliability of data and raise the generalization ability of the model.
准确可靠的标记样本是实现监督学习的基础。然而由于信息的不完整、标记的主观性或编码错误等原因,实际问题中的数据可能包含输入噪声或输出噪声,导致建模产生偏差,进而降低模型泛化能力。考虑到输入噪声对建模的影响相对较低,本项目针对监督学习中的输出噪声问题,开展分类和回归通用的自适应噪声过滤方法研究,主要内容包括:(1)量化分析传统噪声过滤方法的有效性;(2)误差与噪声的关联统计分析;(3)探索自适应噪声过滤方法;(4)面向监督学习的噪声过滤算法设计。本项目的研究成果将进一步丰富和发展监督学习的输出噪声过滤理论及方法,为合理选择噪声过滤方法、解决实际问题提供指导,同时有效提高数据的可靠性从而实现模型泛化能力的提升。
针对标签噪声给监督学习带来的挑战,项目从统计学习理论和误差关联分析等视角,系统开展了低质噪声数据的泛化误差界修正、噪声过滤框架构建、异常数据检测、噪声估计与识别、噪声过滤算法设计及应用等方面的研究,发展了基于个性化近邻、异常检测和误差集成等标签噪声估计和质量评价方法,构建了面向数值型和类别型标签噪声数据的高效清洗算法,开发了基于标签噪声估计和过滤的噪声数据清洗实验平台,并在图像标注和众包标注等实际问题中进行了测试及应用。项目研究成果已达到研究目标。本项目的研究成果进一步丰富和发展了监督学习的标签噪声过滤理论及方法,为提升数据质量和模型泛化性能提供了理论支撑和可行方案。
{{i.achievement_title}}
数据更新时间:2023-05-31
玉米叶向值的全基因组关联分析
正交异性钢桥面板纵肋-面板疲劳开裂的CFRP加固研究
硬件木马:关键问题研究进展及新动向
基于SSVEP 直接脑控机器人方向和速度研究
小跨高比钢板- 混凝土组合连梁抗剪承载力计算方法研究
面向文本分类的迁移学习和半监督学习方法研究
基于类别噪声过滤学习的核分类器优化
面向图像识别的半监督距离度量学习方法研究
兼容噪声标签的弱监督特征学习与图像理解