面向文本信息安全的类别语义模型分类方法研究

基本信息

批准号：61202226

项目类别：青年科学基金项目

资助金额：22.00

负责人：周晓飞

学科分类：

依托单位：中国科学院信息工程研究所

批准年份：2012

结题年份：2015

起止时间：2013-01-01 - 2015-12-31

项目状态：已结题

项目参与者：张浩亮,李军,乔治,郭静,王鹏,尚燕敏,臧文羽

关键词：

信息安全文档分类数据挖掘分类方法类别语义

结项摘要

Text information security is one of the most important problems in web information security field, and its crucial work is text document categorization problem. As a text document takes much semantic information, classification method for information security should have the capacity to discover the latent semantic under the document. Currently, the latent semantic models used in document categorization only realize the dimensional reduction for classifying, which could not capture class-semantic feature from each class, and corresponding classification processing in the semantic space also depends on the represented samples without directly utilizing class-semantic information. .With the requirement of text information security research, the aim of this project is to research some text document classification methods, which can not only get class-semantic features but also obtain higher classification accuracy. The following researches would be studied in the project: (1) Research on capturing the class-semantic features from each class, and then construct the class-semantic representation models by the class-semantic features. There are two semantic representation models, apparent feature model and latent feature model in our project. Directly training classifiers on those representation models can avoid common representation computation by latent semantics, and classifiers can still work well when with large training samples. (2) Research on the text classification methodology based on the class-semantic representation model. The classifiers, which can capture the class-semantic character and text space distribution features, and also preserve class-semantic probability mixture features, will be designed in our project. The research of classification method based on class-semantic representation model in the project has significant academic value, which can provide the valid theories, technologies and deep security analysis for text information security.

文本信息安全是互联网信息安全研究的重要问题，它的核心技术是文本分类技术。由于文本具有语义特性，使得文本信息安全亟需具有语义发现能力的高效文本分类方法。目前的文本分类研究对于语义特征的提取，仅实现了潜层语义空间对文档特征向量的降维作用，并没有充分的利用文档类别自身的语义特征；对相应分类算法来说，也没有有效利用类别语义信息。.面对文本信息安全对高性能文本分类方法的需求，本项目旨在研究兼顾类别语义和高效分类能力的分类方法。主要研究内容包括：1)针对类别样本有效的提取类别语义特征，研究基于显式和隐式特征的类别语义表达模型，避免语义表示的重计算；2)研究基于类别语义表示模型的分类理论和技术，设计兼顾类别语义和样本空间分布特点，并保持语义概率混合特性的分类器。项目的研究工作将为高效地分析文本信息深层安全性提供有效的理论、技术和方法，具有着重要的学术价值和科学意义。

项目摘要

面向文本信息安全对高性能文本分类方法的需求，本项目旨在研究兼顾类别语义和高效分类能力的分类方法。我们研究开展了对文本数据进行潜在的语义信息提取，构建隐式和显式特征的类别语义表达模型，在此基础上开展类别语义表示和分类理论和技术研究。项目完成了隐式类别语义凸结构特征提取、隐式类别语义分类方法、概率语义的显式特征提取与分类、主成分显式语义提取与分类、聚类显式语义提取与分类，以及基于矩阵分解的显性特征提取与分类方法的研究。结果表明，直接由语义特征构建向量空间分类器的研究方案是可行与有效的。项目成果不但为分类器设计研究开拓了新的思路，而且对文本信息深层安全的应用研究提供有效的理论与技术支持。

项目成果

DOI：{{i.doi}}

发表时间：{{i.publish_year}}

暂无此项成果

数据更新时间：2023-05-31

其他相关文献

DOI：

发表时间：2017

DOI：

发表时间：2018

DOI：

发表时间：2022

DOI：10.3969/j.issn.1002-0268.2020.03.007

发表时间：2020

DOI：10.7544/issn1000-1239.2018.20170425

发表时间：2018

周晓飞的其他基金

批准号：61901145

批准年份：2019

资助金额：26.00

项目类别：青年科学基金项目

相似国自然基金

面向文本推理的汉语语义计算模型研究

批准号：90920011

批准年份：2009

负责人：李素建

学科分类：F03

资助金额：50.00

项目类别：重大研究计划

基于网络文本语义的信息隐藏方法研究

批准号：61472092

批准年份：2014

负责人：李福芳

学科分类：F0206

资助金额：86.00

项目类别：面上项目

面向汉语文本理解的语义计算方法

批准号：91520204

批准年份：2015

负责人：赵铁军

学科分类：F03

资助金额：171.00

项目类别：重大研究计划

基于认知机理和语义层次的文本分类方法研究

批准号：60673109

批准年份：2006

负责人：江铭虎

学科分类：F06

资助金额：25.00

项目类别：面上项目

面向文本信息安全的类别语义模型分类方法研究

{{i.achievement_title}}

暂无此项成果

其他相关文献

论大数据环境对情报学发展的影响

硬件木马:关键问题研究进展及新动向

基于公众情感倾向的主题公园评价研究——以哈尔滨市伏尔加庄园为例

栓接U肋钢箱梁考虑对接偏差的疲劳性能及改进方法研究

面向云工作流安全的任务调度方法

周晓飞的其他基金

基于时空深度表征的无监督视频显著性检测研究

相似国自然基金