Rough set theory is an important tool to deal with vagueness and uncertainty in data mining and machine learning. People have proposed many interesting extensions to this theory along with diversity algorithms for different applications. In this project, we will develop rough set approaches to cost-sensitive learning, which is the task of learning from decision systems where a variety of costs are involved. These approaches form a framework called cost-sensitive rough set theory. We focus on two most important types of costs, namely test costs and misclassification costs. Test costs are the time, money, or other resources spent in obtaining data items related to some objects; misclassification costs correspond to the penalty of deciding that an object belongs to class J when its real class is K. In the rough set society, there are some existing research works on cost-sensitive learning. For example, the test cost attribute reduction problem has been studied recently, and misclassfication cost has been addressed through decision theoretic rough set models. However, there are even more interesting problems to address, and the performance of some existing algorithms should be improved. We will study cost-sensitive rough set theory systematically from four viewpoints, namely data model, computational model, problem and algorithm. Specifically, we consider the following issues: 1) cost-sensitive decision systems concerning nominal, numeric, uncertain data and different relationships among costs; 2) cost-sensitive rough set theory concerning approximations, positive regions, reducts, decision rules, etc.; 3) cost-sensitive attribute reduction and generalized reduction including discretization, symbolic value partition, cost-constraint reduction, etc.; and 4) cost-sensitive classification through decision rules, decision trees, Bayesian networks, etc. To sum up, we will establish cost-sensitive rough set theory, design efficient and effective algorithms for attribute reduction, generalized reduction and classification, and provide data mining solutions with low cost and low risk for applications.
粗糙集理论作为数据挖掘的重要分支,需要多层次、多角度的扩展,以满足不同应用的需求。测试代价和误分类代价是许多现实数据的重要方面,也是不少数据挖掘方法的研究重点。决策粗糙集理论以误分类代价和延迟决策代价为基础数据,其理论研究在近年来取得了长足进展,实际应用领域也不断扩张。相比之下,粗糙集理论在测试代价方面的工作则处于起步阶段,需要更多创新性研究。本项目从数据模型、计算模型、问题、算法四个层次,系统地研究测试代价和误分类代价敏感的粗糙集理论与方法。主要研究内容包括:1)代价敏感决策系统模型;2)代价敏感粗糙集方法;3)代价敏感属性约简与泛化约简问题;4)代价敏感规则生成问题。通过解决其中的关键问题,建立代价敏感粗糙集理论体系,针对具体问题设计出高性能算法,为实际应用提供高效率、低成本、低风险的数据挖掘方案。
粗糙集理论与方法经过多年发展,已经成了数据挖掘和人工智能的重要分支。为使其具有更好的实用性,研究者从多层次、多角度的对其进行了扩展。其中,决策粗糙集考虑了误分类代价和延迟代价,获得了不少有意义的结果。. 本项目考虑测试代价、误分类代价、教师代价、时间代价,从数据模型、计算模型、问题、算法四个层次,建立了代价敏感粗糙集理论与方法。主要研究内容包括:1)代价敏感属性约简;2)代价敏感推荐系统;3)代价敏感主动学习;4)离散化、名词型属性值分组等泛化约简。. 通过对代价敏感属性约简问题的研究,获得了经济、时间成本低的属性检测方案。通过对代价敏感推荐系统的研究,获得了低误分类代价的推荐方案。通过对代价敏感主动学习的研究,获得了标签获取代价与误分类代价之和较低的分类方案。通过对泛化约简问题的研究,获得了数据压缩率高,分类精度更高的泛化约简方案。. 针对代价敏感属性约简,提出了一系列有价值新问题,开发了一套高效、结果好的算法。将代价和三支决策引入推荐系统,针对分类与回归问题,提出了三支推荐系统的问题与方法,获得了最优阈值设置。在主动学习中使用三支决策的思想,提出基于聚类的三支主动学习方法,进一步拓展了代价敏感粗糙集的应用范围。提出了双阶段的离散化与名词型值分组方法,针对实际问题的分类表明,所构建的分类树比人工构建更合理。. 经过项目组30余名实际参与师生的努力及与粗糙集领域国内外学者的合作,推动了代价敏感粗糙集的进展,获得了比预期更好的成果。
{{i.achievement_title}}
数据更新时间:2023-05-31
论大数据环境对情报学发展的影响
监管的非对称性、盈余管理模式选择与证监会执法效率?
基于公众情感倾向的主题公园评价研究——以哈尔滨市伏尔加庄园为例
敏感性水利工程社会稳定风险演化SD模型
基于协同表示的图嵌入鉴别分析在人脸识别中的应用
代价敏感的稀疏学习与距离度量学习方法研究
代价敏感的主动学习研究
基于决策粗糙集的代价敏感知识获取方法及其应用研究
多标签降维中的多重代价敏感学习问题研究