Open Set Multi-Class Recognition (OSMCR) refers to a type of classification task that a large number of classes in a test set are not included in the corresponding training data set. The openness of OSMCR makes it a hard problem, especially when the data is of high dimension. This study aims to tackle the OSMCR problem based on error correcting output codes (ECOC) algorithm. Data complexity theory is used in the analysis of class division scheme in different feature subsets. And we will try to design some data complexity measures with the capability to handle high dimension by evaluating a feature subset as a whole. Based on these measures, we will explore effective class-decomposition schemes based on diverse feature subsets, and along with the solution for class imbalance problem. Evolutionary algorithm will also be deployed to generate an optimal ECOC coding matrix. By taking such efforts, we can get a deeper insight to the nature of multiclass problem. For OSMCR problem, an unknown class is defined to include all unknown samples those don’t belong to any known class. The strategy of assigning a test sample to the unknown class or a known class is then an important problem we need to solve at the first step. A Dynamic ECOC algorithm is designed to add rows/columns for the division of the unknown class from known classes. To make the distances as large as possible, the distances among all classes and the distance among rows in a coding matrix will be adjusted by a searching algorithm to achieve an optimal solution. Furthermore, we will try to adjust the original idea of regarding all samples not included in known classes as an unknown class. We will try to further divide the unknown class into some classes, so that the distances among known classes and the unknown classes can be further enlarged. The decomposition of the unknown class can aid to draw accurate boundaries among unknown classes and the known classes. As the OSMCR is a newly proposed research topic, any progress of our project can promote the further exploration of this research field, especially the unknown class decomposition problem which is still an untouched problem.
开集多类识别指在有监督学习中,未知数据含有大量未包含于训练数据中的新类别,因此动态识别未知类别就成了研究该问题的关键。这是机器学习领域的新问题,本课题将结合数据复杂度理论,针对高维多类数据设计复杂度测度,结合样本数不均衡解决技术,探索基于数据复杂度的自适应输出纠错编码(ECOC)算法,并设计基于进化算法的ECOC算法,从多个角度探索基于高维特征子集的类别分解机制。在此基础上,根据未知数据与已知类别在不同特征子集的分布差异,实现动态ECOC算法,通过增加编码矩阵的行/列向量,对新类别与已知类别进行动态划分。更进一步,将根据新类别的数据分布情况,将新类别分解成多个子类,设计基于聚类的动态ECOC算法,基于反馈方式调整未知类别的子类划分与编码矩阵的行/列,寻找分界面与编码距离的所有类别距离最大化的双重目标最优解。开集多类数据识别是一个新问题,本课题的研究将为相关研究领域的进一步拓展打下基础。
本项目深入探讨了输出纠错编码算法(Error-Correcting Output Codes, ECOC)的理论、算法及其应用研究,围绕多类别判定的有监督学习进行探讨,结合数据复杂度理论与样本数不均衡解决技术,探索自适应ECOC算法的设计思路,设计了多个基于进化算法的ECOC算法,从多个角度探索基于高维特征子集的类别分解机制。在此基础上,团队结合三进制算子、特征子空间寻优等机器学习理论方法,完成了多个新的输出纠错编码算法框架,根据开集数据的特点,从未知数据与已知类别在不同特征子集的分布差异,实现动态ECOC算法,通过增加编码矩阵的行/列向量,对新类别与已知类别进行动态划分。在研究工作的开展中,团队提出了多个具有首创性的算法设计思路,包括:(1)提出了首个基于遗传规划的ECOC算法框架,将遗传规划的树形结构个体与ECOC编码矩阵相结合,通过有效的个体映射方式,引导遗传规划在进化过程中基于多个约束条件下对ECOC编码矩阵的搜索寻优;(2)提出了基于变长编码的ECOC算法框架,对分类过程中难识别的类别及样本设计针对性的识别方法,从而完成了两阶段的编码算法设计方案,为易识别与难识别的类别分别提供了不同长度的编码算法;(3)提出了首个软编码的ECOC算法思路,根据各个基本分类器的在不同类别的输出分布特点,包括不同类别输出值的均值、区间等基本信息,分别进行新的类别重编码,设计出编码矩阵的重编码算法,实现了首个基于编码特点的重编码方案;(4)提出了首个基于聚类的动态ECOC算法,基于动态选择性集成算法的基本设计思路,根据样本的数据分布特点,动态选择对未知样本最合适的特征空间进行类别判定,寻找分界面与编码距离的所有类别距离最大化的最佳方案。.此外,团队还进一步拓展研究成果,针对偏标签数据、微表情数据、基因微阵列数据等不同数据特点,设计了多个多类别判定算法,结合深度森林、深度神经网络等模型进行算法融合,设计了适用于深度学习的ECOC算法,并针对不同问题域的数据特征构建了新的机器学习模型。
{{i.achievement_title}}
数据更新时间:2023-05-31
论大数据环境对情报学发展的影响
基于 Kronecker 压缩感知的宽带 MIMO 雷达高分辨三维成像
小跨高比钢板- 混凝土组合连梁抗剪承载力计算方法研究
基于多模态信息特征融合的犯罪预测算法研究
基于分形维数和支持向量机的串联电弧故障诊断方法
基于纠错输出编码的高精度汉字识别研究
数据挖掘中若干类关键算法的研究
基于模糊粗糙集的概率数据挖掘方法研究
时间序列数据挖掘中的聚类模型与算法研究