Social media is gaining popularity in recent years and increasingly becoming an integral part of our life. The growth of social media data in size and variety accelerates rapidly as more people use social media such as Facebook, Twitter, LinkedIn, among others. It is a massive “treasure trove” interesting to researchers and practitioners of different disciplines, and a great source for data mining. However, attribute-value data in classic data mining differs from social media data besides both are large-scale. In social media, different events concerning groups can be defined by comparing communities across time. These events include growth, contraction, merging, splitting, birth, and death. Social media data points are also inherently not independent and identically distributed (i.i.d.), but linked. Furthermore, social media data is also noisy, incomplete, comprised of multiple sources, and embedded with multi-mode and multi-dimensional networks. These unique properties present unprecedented challenges for mining social media data. .In real-world applications, high-dimensional data is ubiquitous - from text categorization, to image processing, and to Web searches. Therefore, subspace clustering has been studied extensively in recent years. The goal of subspace clustering is to locate clusters with their own associated dimensions that are embedded in different subspaces of the original data space. Existing subspace clustering algorithms that have been proven effective for data mining are unequipped for social media mining. In this research, we propose a new kind of subspace clustering to facilitate the computational understanding of social media, investigating associated fundamental research issues and developing new, effective algorithms. .As networks are highly dynamic, we propose to develop new algorithms to enable the capability of clustering high-dimensional evolutionary social media data from the subspace clustering perspective. We also define the problem of subspace clustering with linked data and present a preliminary study to demonstrate how link information can be integrated into subspace clustering for social media data. A prominent characteristic of social media is that its data comes from a range of multiple sources. As data of each source can be noisy, partial, or redundant, selecting relevant sources and using them together can help effective subspace clustering. We define types of sources and propose to study subspace clustering by using source information..The project lies at the confluence of data mining and social computing. The preliminary work towards the goal is to develop novel methods and expand research capabilities in clustering analysis and social media mining, can also contribute in improving machine learning and information retrieval, and expediting the development of a new generation of social media mining tools.
随着互联网的普及和流行,出现大量用户参与的Web应用程序和社会信息网络,包括博客、论坛、共享媒体平台、微博、社会网络、社会新闻、社会书签和维基百科等,统称为社会媒体。由于社会媒体在政治经济和日常生活发挥着越来越重要的作用,针对社会媒体的数据挖掘和机器学习算法研究成为当前本领域的研究热点。本课题就是以解决社会媒体挖掘问题为背景,研究针对高维社会媒体数据的子空间聚类方法。研究内容包括:1)基于数据整合和模型整合策略,提出针对社区演化数据的子空间聚类算法;2)根据社会媒体数据间的链接约束进行建模,提出针对链接约束数据的子空间聚类算法;3)利用社会媒体不同视图数据特征间的相互关系,提出针对多视图数据的子空间聚类算法;4)收集并整理社会媒体数据,扩展所提新算法在社会媒体挖掘方面的应用。本项目研究基础好,思路清楚,应用背景明确,研究成果将为数据挖掘和社会计算等领域提供重要的学术价值和研究意义。
在2015.01-2017.12执行国家自然科学基金(No. 61403247)过程中,按项目申请书和项目计划书的进度安排,开展了面向社会媒体数据的子空间聚类算法研究,并在此基础上在相关方向进行了拓展研究,主要内容具体包括:首先,针对社会媒体数据具有的高维、演化、链接约束和多视图等数据特性,对软子空间聚类算法的国内外研究现状进行了总结;其次,分别探讨了面向社区演化数据的流数据聚类技术、面向链接约束数据的半监督学习技术和面向社会媒体数据的预测精度提升技术。项目执行过程中在相关领域形成了一批研究成果,所得结果对子空间聚类理论及其在社会媒体挖掘应用等方面具有重要的价值和意义。
{{i.achievement_title}}
数据更新时间:2023-05-31
演化经济地理学视角下的产业结构演替与分叉研究评述
涡度相关技术及其在陆地生态系统通量研究中的应用
论大数据环境对情报学发展的影响
跨社交网络用户对齐技术综述
黄河流域水资源利用时空演变特征及驱动要素
面向社会化媒体异构大数据的快速组合聚类研究
面向大规模二维数据的岭回归子空间聚类算法研究
基于稀疏低秩表示的子空间聚类算法研究
复杂多视图高维数据子空间聚类方法研究