Text information security is one of the most important problems in web information security field, and its crucial work is text document categorization problem. As a text document takes much semantic information, classification method for information security should have the capacity to discover the latent semantic under the document. Currently, the latent semantic models used in document categorization only realize the dimensional reduction for classifying, which could not capture class-semantic feature from each class, and corresponding classification processing in the semantic space also depends on the represented samples without directly utilizing class-semantic information. .With the requirement of text information security research, the aim of this project is to research some text document classification methods, which can not only get class-semantic features but also obtain higher classification accuracy. The following researches would be studied in the project: (1) Research on capturing the class-semantic features from each class, and then construct the class-semantic representation models by the class-semantic features. There are two semantic representation models, apparent feature model and latent feature model in our project. Directly training classifiers on those representation models can avoid common representation computation by latent semantics, and classifiers can still work well when with large training samples. (2) Research on the text classification methodology based on the class-semantic representation model. The classifiers, which can capture the class-semantic character and text space distribution features, and also preserve class-semantic probability mixture features, will be designed in our project. The research of classification method based on class-semantic representation model in the project has significant academic value, which can provide the valid theories, technologies and deep security analysis for text information security.
文本信息安全是互联网信息安全研究的重要问题,它的核心技术是文本分类技术。由于文本具有语义特性,使得文本信息安全亟需具有语义发现能力的高效文本分类方法。目前的文本分类研究对于语义特征的提取,仅实现了潜层语义空间对文档特征向量的降维作用,并没有充分的利用文档类别自身的语义特征;对相应分类算法来说,也没有有效利用类别语义信息。.面对文本信息安全对高性能文本分类方法的需求,本项目旨在研究兼顾类别语义和高效分类能力的分类方法。主要研究内容包括:1)针对类别样本有效的提取类别语义特征,研究基于显式和隐式特征的类别语义表达模型,避免语义表示的重计算;2)研究基于类别语义表示模型的分类理论和技术,设计兼顾类别语义和样本空间分布特点,并保持语义概率混合特性的分类器。项目的研究工作将为高效地分析文本信息深层安全性提供有效的理论、技术和方法,具有着重要的学术价值和科学意义。
面向文本信息安全对高性能文本分类方法的需求,本项目旨在研究兼顾类别语义和高效分类能力的分类方法。我们研究开展了对文本数据进行潜在的语义信息提取,构建隐式和显式特征的类别语义表达模型,在此基础上开展类别语义表示和分类理论和技术研究。项目完成了隐式类别语义凸结构特征提取、隐式类别语义分类方法、概率语义的显式特征提取与分类、主成分显式语义提取与分类、聚类显式语义提取与分类,以及基于矩阵分解的显性特征提取与分类方法的研究。结果表明,直接由语义特征构建向量空间分类器的研究方案是可行与有效的。项目成果不但为分类器设计研究开拓了新的思路,而且对文本信息深层安全的应用研究提供有效的理论与技术支持。
{{i.achievement_title}}
数据更新时间:2023-05-31
论大数据环境对情报学发展的影响
硬件木马:关键问题研究进展及新动向
基于公众情感倾向的主题公园评价研究——以哈尔滨市伏尔加庄园为例
栓接U肋钢箱梁考虑对接偏差的疲劳性能及改进方法研究
面向云工作流安全的任务调度方法
面向文本推理的汉语语义计算模型研究
基于网络文本语义的信息隐藏方法研究
面向汉语文本理解的语义计算方法
基于认知机理和语义层次的文本分类方法研究