With the development of online shopping, a large number of product reviews mainly exist in the form of text have been produced on the Web, which contains a lot of knowledge about product evaluation. Using data mining and natural language processing technology, to extract fine-grained product aspects and opinion words from the massive review texts, and further obtain the sentiment tendency at the aspect level poses great challenges towards sentiment analysis. .According to the features of Chinese reviews, the semantic relations between words are acquired from the syntactic analysis, word meaning comprehension and context correlation, and then it is embedded into the topic model as the constrained knowledge, which can guide the topic model semantically for finding fine-grained topical words. The project focuses on the following problems. First, the must-link and the cannot-link semantic relation networks are constructed, and use it as the semantic knowledge to constrain the words assigned to the topic in topic model. Second, the topic-word allocation algorithm that meets the semantic relation between product aspects and opinions is designed, and which aims to find more local aspects and local opinion words as possible. Third, the SRC-LDA (semantic relation constrained LDA) model is put forward for extraction of the fine-grained aspects and opinion words. .The project will bring forward a new mechanism to realize the semantic constraints in topic model, and construct a new model for topical words extraction that conforms to the distribution features of aspects and opinions in Chinese review texts, which will bring to new ideas and explore new ways for topical knowledge mining of Chinese product reviews under the background of big data.
随着网络购物的发展,Web上产生了大量的商品评论文本数据,其中蕴含了丰富的评价知识。运用数据挖掘和自然语言处理技术,从海量的评论文本中提取细粒度商品特征和情感词,进而获取特征级别的情感倾向,是商品评论情感分析面临的新挑战。本项目研究根据中文商品评论文本的特点,从句法分析、词义理解和语境相关等多角度获取词语间的语义关系,然后将其作为约束知识嵌入到主题模型,从而实现有语义指导的细粒度主题词发现,研究内容有:①构建词语语义关系网,利用其语义知识来约束词语对于主题的隶属关系;②设计符合商品特征和情感词语义关系特点的主题-词语分配算法,以尽可能多地发现局部特征词和局部情感词;③提出语义关系约束的主题模型,有效提取细粒度特征和情感词。本项目研究将提出主题模型的语义约束新机制,构建符合特征和情感词分布特点的主题词提取新模型,为实现大数据背景下的中文商品评论文本的主题知识挖掘提供新思路和探索新途径。
随着互联网的普及,Web上产生了大量的评论类文本数据,其中蕴含了丰富的评价知识,这些知识的提取对于电子商务、商业智能、信息监控和舆情分析等方面都有着重要的应用。本项目以商品评论等文本为主要研究对象,运用数据挖掘和自然语言处理等相关技术,从海量的评论文本中提取细粒度特征和情感词,获取特征级别的情感倾向,进而实现评论文本的细粒度情感分析。.本项目研究根据中文评论文本的特点,从句法分析、词义理解和语境相关等多角度获取词语间的语义关系,然后将其作为约束知识嵌入到主题模型,从而实现有语义指导的细粒度主题词发现,主要研究内容包括:(1) 构建词语语义关系网,利用其语义知识来约束词语对于主题的隶属关系;(2) 设计符合商品特征和情感词语义关系特点的主题-词语分配算法,以尽可能多地发现局部特征词和局部情感词;(3) 提出语义关系约束的主题模型,有效提取细粒度特征和情感词。.从语义约束角度对主题模型进行弱监督改造,提升LDA主题模型对中文商品评论文本的语义理解能力,使它能够按照预定语义目标进行主题词挖掘,实现了细粒度商品特征和情感词的提取。设计了弱监督的SRC-LDA(semantic relation constrained LDA)、AC-LDA(association constrained LDA)等系列算法和模型,对商品评论及微博等文本数据进行了大规模数据量的测试和分析,并验证了算法和模型的有效性。.本项目研究以实际Web评论文本为数据源,从新的角度对评论类文本中主题模型的作用机理进行了研究,对大数据背景下的文本语义提取进行了探索,提出了符合中文文本语法、语义结构特点的主题模型的语义约束新机制,提升了主题模型对海量文本数据进行语义提取和知识挖掘的能力。
{{i.achievement_title}}
数据更新时间:2023-05-31
玉米叶向值的全基因组关联分析
粗颗粒土的静止土压力系数非线性分析与计算方法
正交异性钢桥面板纵肋-面板疲劳开裂的CFRP加固研究
硬件木马:关键问题研究进展及新动向
基于SSVEP 直接脑控机器人方向和速度研究
基于词嵌入主题模型的语义稀疏型Web服务发现研究
基于深度学习和主题模型的文本特征提取方法研究
社交网络中藏语话题情感分析和主题词提取研究
融合语义相似性和关联性的深层主题模型研究