Cluster analysis of multi-source big data is an important issue in big data research. It has to face the big challenges arisen from multi-source noise with complex structures. Existing methods are developed from different perspectives, such as multi-view clustering, clustering ensemble, multi-kernel and multi-relational clustering. These methods can not effectively handle such noises. We propose a robust clustering framework to systematically address the challenges arisen from multi-source noise with complex structures. It is worthwhile to highlight several aspects of the proposed approach here: 1) The two key sub-problems, i.e. multi-source noise joint reduction and multi-source joint clustering, are integrated into a unified framework to well capture their interactions. 2) The multi-source joint clustering result is used to guide the process of multi-source noise joint reduction. The complex noise among multi-source data can be captured by either multi-source data reliability joint modeling or multi-source noise joint extraction. Thus, the adverse effect of multi-source noise can be systematically alleviated by the corresponding robust learning mechanism, i.e. error detection or error correction. 3) A better multi-source big data clustering can be expected by consensus maximization among noise reduced data. 4) To perform multi-source big data clustering in a distributed computing platform, an easy to deploy and efficient algorithm will also be developed. 5) The above multi-source robust clustering framework can be flexibly adapted for different scenarios. The mining of big data will be beneficial from the research on this project.
多源大数据的聚类分析是大数据研究面临的重要问题之一。由于数据规模大来源广,多源大数据聚类不得不面对数据中广泛存在复杂噪声。现有方法从不同角度进行多源聚类,如多视图聚类、聚类集成、多核聚类和多关系聚类。这些方法不能有效的处理多源复杂噪声。我们提出多源大数据鲁棒聚类方法系统性的处理多源复杂噪声带来的挑战,具体包括:1)在一个统一的框架中联合处理多源降噪和融合聚类两个相互依赖的子问题;2)利用融合聚类结果指导多源降噪,通过多源数据可靠性联合建模和多源噪声联合抽取两种策略刻画这些复杂噪声,并采用对应的噪声检测和噪声矫正两种鲁棒学习机制系统性的减轻多源复杂噪声的干扰;3)利用降噪后的数据进行一致性最大化学习,进而实现多源融合聚类;4)设计高效并易于在分布式计算平台部署的算法求解多源大数据鲁棒聚类模型;5)灵活调整该框架以处理不同类型的多源大数据。本项目的开展有助于提升对大数据内在价值的挖掘。
我们在多源大数据鲁棒聚类若干关键科学问题方面取得了有益的进展,具体包括:1)提出多核噪声恢复算法,在数据层对多源数据进行鲁棒集成;2)提出多核鲁棒 K-均值算法,在模型层实现多源数据的鲁棒集成;3)提出鲁棒聚类集成方法处理决策层结构性噪声;4)提出了动态多视图 SVM 算法和动态多视图谱聚类算法;5)提出了自适应的无监督特征选择算法。总地来说,课题组与本项目直接相关的论文 10 多篇。申请人论文近 5 年引用次数将近 400 余次(Google Scholar),课题组获得了一定的国际影响力和产业界的关注。
{{i.achievement_title}}
数据更新时间:2023-05-31
基于多模态信息特征融合的犯罪预测算法研究
惯性约束聚变内爆中基于多块结构网格的高效辐射扩散并行算法
多空间交互协同过滤推荐
多源数据驱动CNN-GRU模型的公交客流量分类预测
基于混合优化方法的大口径主镜设计
面向多源异构数据的多聚类通用模型及安全高效算法研究
面向多源异构流数据的在线聚类集成算法研究及其应用
面向高维数据的稀疏与鲁棒线性判别分析模型与算法研究
面向地理标签数据的高效聚类算法研究