Imbalanced data learning is one of the challenges in big data processing. This program aims at a systematic study on the primary problem, namely “What to learn?”, in the imbalanced data learning. In a theoretical level, we will explore what the specific learning targets will be required by the imbalanced data learning in both “linguistic” and “computational” levels, respectively. A study will be made on the intrinsic properties of the learning targets and evaluation criteria, so that we can reach a theoretical understanding why some measures are proper in dealing with imbalanced data learning, some are not. We will further explore the information-based learning targets and criteria in comparison with the non-information ones, and will derive the their relations with respect to the imbalance ratio. The goal of the analytical study is to provide the guidelines in the selections of learning targets and evaluation criteria. In the approach level, we will advance the current classifiers with the abstaining functions for wider applications. We will study on the optimization of reject threshold and its associated properties. We will further explore the information-based learning targets and criteria in comparison with the non-information ones. Their connections are investigated. A novel boosting classifier will be developed by setting the multiple learning targets for a classifier-example study towards a large-scale data process. These targets will include the adaptation of imbalance ratio in the data, abstaining and non-abstaining classification, and convexity optimization. The final goal of this program is to put forward on the new study theme of “learning target selection” in machine learning and to provide a study example in the abstaining classifier design in imbalanced data learning.
不平衡数据学习是大数据中的挑战之一。本课题旨在针对不平衡数据学习中首要问题“学习目标选择”进行系统性研究。在理论层面,探讨不平衡数据学习对“语义”与“计算”表达层面的特定学习目标;分析各种学习目标或评价准则的本质属性,解释为什么有些学习目标或准则能够完成不平衡数据学习任务,有些则无法胜任;推导各种常规性能类和信息类学习目标或评价准则与不平衡数据比的定量或定性关系。理论研究将为应用中选择学习目标或评价准则提供理论依据。在方法层面,扩展现有分类器包括拒识功能的应用,研究优化拒识学习目标及其拒识中优化门槛值性质;开展面向大规模数据的Boosting分类器研究,使其能够实现带拒识功能的学习,自适应于不平衡比的优化门槛值调节,并尽量兼容“凸优化”的学习目标。本课题的最终目标是推动以“学习目标选择”为主题的新视角研究方向,并为不平衡数据学习中包容拒识功能的分类器设计提供具体研究实例。
不平衡数据学习是大数据中的挑战之一。本课题针对不平衡数据学习中首要问题“学习目标选择”进行了系统性研究。在理论层面,探讨了不平衡数据学习对“语义”与“计算”表达层面的特定学习目标;分析了各种学习目标或评价准则的本质属性,对于学习目标或评价准则是否能胜任不平衡数据中的学习任务,以人脸图像为例进行了解释;推导了两种常规性能类和信息类学习目标或评价准则与不平衡数据比的定量或定性关系。理论研究为应用中选择学习目标或评价准则提供了理论依据。在方法层面,我们扩展了现有分类器包括拒识功能的应用,研究优化了拒识学习目标及其拒识中优化门槛值性质。本课题的研究成果推动了以“学习目标选择”为主题的新视角研究方向,并为不平衡数据学习中包容拒识功能的分类器设计提供了具体研究实例。
{{i.achievement_title}}
数据更新时间:2023-05-31
路基土水分传感器室内标定方法与影响因素分析
论大数据环境对情报学发展的影响
监管的非对称性、盈余管理模式选择与证监会执法效率?
基于SSVEP 直接脑控机器人方向和速度研究
基于公众情感倾向的主题公园评价研究——以哈尔滨市伏尔加庄园为例
高维不平衡数据的集成学习算法研究
基于半监督集成学习的不平衡数据研究
面向不平衡数据的学习算法及应用研究
基于集成学习的不平衡流数据分类问题研究