With the popularity of social media such as micro-blog and blog applications, the general public can write online rivews on the Web very conveniently. The social media include a large amount of public opinion information. To detect and monitor the public opinion information in the social media, the key issues are to identify the topic, sentiment and their dynamic evolutionary trends in these social media timely and accurately. However because of the social media data are short text, irregular expressive, dynamic and large scale, it is a new challenge to detect public opinion in the social media. The main shortages of the existing method are: (1)to represent the features of public opinion information with words, which is easy to cause the problem of feature sparsity; (2) can not represent the topic and sentiment and their dynamic evolutionary of public opinion information with a unified model; (3) to process the public opinion information with a static, off-line and small scale mode. Aiming at above problems, this project research the public opinion information detection from four aspect: the feature representation of social media data, the representation of public opinion, the evolutionary dynamics of public opinion and the large scale data processing. The research content include: (1)integrating muti-layer context information, to propose word vector feature representation based on deep learning models; (2)to propose non-parametrized multi-perspective topic and sentiment model based on probability topic models; (3) to give out the cooperative updating mechanism of feature learning and topic sentiment hybrid model, and to provide dynamic topic sentiment hybrid model with evolutionary ability; (4) to build a series of distributed and parallel algorithms which can detect large scale public opinion information of the social media.The implementation of this project will improve the traditional public opinion security analysis methods, provide theory foundation for the multi-aspect topic and sentiment identification and their evolution analysis, and will improve the adaptive ability, scalability and accuracy of public opinion detection and monitoring task.
监测微博、博客等社会媒体所蕴含的舆情内容,关键是及时准确识别其所涉及的话题、情感及演变态势。但社会媒体数据的短文本、不规范、动态更新等特性,导致社会媒体舆情内容监测面临新挑战,存在的问题有:(1)采用词作为特征表示,易产生特征稀疏问题;(2)未对舆情内容进行统一的话题情感混合建模及演变动力学建模;(3)主要以离线方式处理小规模的静态数据。针对上述问题,本项目从四方面开展研究:(1)整合多层语境信息,提出基于深度学习的词向量特征表达学习;(2)基于概率话题模型,提出非参数化的多视角话题情感混合模型;(3)考虑时间维度,建立特征学习与话题情感混合模型的协同更新方式,提出随时演进的话题情感动态演变模型;(4)基于MapReduce等云计算框架,建立能监测大规模社会媒体舆情内容的分布式并行算法。本项目的实施有望提高社会媒体舆情内容安全监测的准确性、自适应性和可扩展性,具有重要的学术价值和应用价值。
监测社会媒体所蕴含的舆情内容,关键是及时准确识别其所涉及的话题、情感及演变态势。由于社会媒体数据的短文本、不规范、动态更新等特性,导致社会媒体内容监测面临新挑战。已有方法的存在采用词作为特征表示、未对舆情内容进行统一的话题情感混合建模及演变动力学模型、以离线方式处理小规模静态数据等问题。本项目从社会媒体数据的深度特征表达、舆情内容的概率话题混合建模、舆情内容的动态演化分析以及大规模数据处理等方面开展系统研究。本项目对词向量学习模型、递归自动编码、LSTM模型等进行改进,提出系列基于深度学习的词向量特征表达和语义组合模型,如考虑词序和多语境的混合词嵌入模型(MWE)、结合HowNet词典的双向短语递归自动编码模型(CHL-Bi-PRAE)、基于模糊与自动编码器的领域对抗模型(Fuzzy-DAAE)等。本项目针对社会媒体内容动态变化以及需同时进行话题情感识别的需求,以非参数化的层次Dirichlet过程为基础,对话题情感进行建模,并考虑动态话题以及话题情感之间的依赖关系,提出了提出随时间衰减的HDP模型(EHDP)、非参数话题情感混合模型(NJST)和动态非参数话题情感混合模型(DNJST),进行社会媒体的动态话题识别与情感分析。并在此基础上,进一步综合深度学习和概率话题模型的优点,研究新的混合话题情感模型,提出同时训练词嵌入和主题分布的联合词嵌入话题模型(JWET)、弱监督的词向量联合话题情感分析模型(WS-TSWE)、词向量依赖的联合话题情感分析模型(RTSWE)以及基于变分自动编码器的半监督方面级情感分类模型(AL-SSVAE)等模型。本项目还研究各模型相关的有效求解算法和基于Hadoop大数据平台的原型系统实现。本项目的研究,对现有社会媒体内容的话题识别与情感倾向分析有所突破,有利于更准确的理解社会媒体语义表达内容,可用于提升舆情内容监测、社会媒体语义检索等相关系统的性能。
{{i.achievement_title}}
数据更新时间:2023-05-31
玉米叶向值的全基因组关联分析
涡度相关技术及其在陆地生态系统通量研究中的应用
跨社交网络用户对齐技术综述
正交异性钢桥面板纵肋-面板疲劳开裂的CFRP加固研究
黄河流域水资源利用时空演变特征及驱动要素
冠脉内预防性应用山莨菪碱对急性心肌梗死再灌注后心肌微循环障碍/缺血再灌注损伤防治效应及机制的系列研究
面向可视媒体内容安全的取证技术研究
特定主题社会化媒体内容的动态识别关键技术研究
面向大规模RFID系统的标签安全监测关键技术研究
面向社会多媒体内容的知识表达研究