In the vast ocean of data, people how to absorb the essence and discard the dregs, has become people focus on major issues in the Internet age, it is also challenge of big data processing, and it is also the key to the development of national network economy. Sensitive information (i.e negative hot topics, negative incident, bad information etc) filter is the information filtering that is important and very difficult task.Aim at the problems of detection time lag,low accuracy and Poor adaptability etc in internet sensitive information detecting, This project, with the Chinese text media (web page, blog, BBS et.al) in internet as the research object,using technologies of opinion mining,machine learning, High performance computing and natural language processing etc, research on the algorithm of sensitive information feature extraction, to reveal the inner attribute of sensitive informations and sensitive words,research on adaptive top layer filtering model of network sensitive information, to realize that sensitive words is recognized by dynamic, research on adaptive low layer filtering model of network sensitive information, to adaptive identifiy sensitive information from the overall and semantic view.On that basis, realize a prototype system of adaptive classify model of network sensitive text, to verify the availability of research. The project results for large data processing to explore a new way;For public opinion monitoring, business intelligence and aid making decision application system development to provide technical support.
在浩瀚的数据海洋里,人们怎样才能吸取精华、去其糟粕,已经成为互联网时代人们关注的重大问题,也是大数据处理面对的挑战,同时也是国家网络经济发展的关键。敏感信息(如:负面热点话题、负面突发事件、不良信息等)过滤是信息过滤即重要又非常困难的任务。针对敏感信息过滤时间滞后、准确性低、自适应性差等问题,本项目以互联网中文文本媒体(网页、微博、论坛等)为研究对象,采用意见挖据、机器学习、高性能计算和自然语言处理等技术,研究敏感信息特征抽取算法,以揭示敏感信息和敏感词的内在属性;研究敏感信息自适应顶层过滤模型,实现敏感词动态识别及敏感词极性;研究敏感信息自适应低层过滤模型,从整体和语义角度自适应识别敏感信息。在此基础上,实现网络敏感信息自适应多重过滤模型原型系统,用以验证项目研究成果的可用性。项目研究成果将为大数据处理探索一种新的途径;为舆情监控、商业智能、辅助决策等应用系统开发提供技术支持。
新兴网络媒体的不断出现,对舆情的提取、分析提出了更多的挑战和要求。针对舆情分析时间滞后、准确性低、自适应性差等问题。开展了“网络敏感信息自适应多重过滤模型研究”(61340037)的研究,经过一年的研究实践,围绕立项的目标,已发表论文7篇,录用3篇,申请发明专利2项,登记软件著作权1项,培养硕士研究生2人。主要研究内容包括:研究了基于主题模型的网络敏感信息动态特征抽取方法,实现敏感字典的动态维护;研究了基于情感计算的观点句识别算法,从认知科学出发把传统的情感区分为:情绪、感受、观点情感,目的是有利于分析对某事件发表个人观点的句子,使信息过滤更有针对性,实现细粒度情感语义分析。研究了网络敏感信息自适应多重过滤模型的整体架构。并在真实语料上进行了相关实验,取得了初步结果。
{{i.achievement_title}}
数据更新时间:2023-05-31
论大数据环境对情报学发展的影响
敏感性水利工程社会稳定风险演化SD模型
高压工况对天然气滤芯性能影响的实验研究
圆柏大痣小蜂雌成虫触角、下颚须及产卵器感器超微结构观察
多空间交互协同过滤推荐
多重自适应网络传播模型与重要节点防御研究
网络环境中基于专题文献的信息过滤优化模型与运行机制研究
多重加权网络中病毒和信息交互传播模型及控制
冗余和虚假信息对复杂网络及信息过滤的影响的研究