基于自然语言处理的安全漏洞静态检测方法研究

基本信息

批准号：61802413

项目类别：青年科学基金项目

资助金额：25.00

负责人：黄建军

学科分类：

依托单位：中国人民大学

批准年份：2018

结题年份：2021

起止时间：2019-01-01 - 2021-12-31

项目状态：已结题

项目参与者：弓媛君,白石磊,韩松明,吴贻芳,李红程,周立博,王熙栋,张羿伟

关键词：

检测规则安全漏洞静态检测自然语言处理漏洞检测

结项摘要

Security vulnerability static detection is a hot research field. The main problem for this field lie in two folds. Firstly, conventional static detection relies on manual detection rule extraction, the efficiency of which is a huge trouble, and the extracted rules are mostly specific to the target system. Therefore, extracting the detection rules has been the obstacle of developing advanced vulnerability static detection techniques. Secondly, detection rule extraction based on certain data mining techniques depends on how the number of secure programming patterns is far beyond the number of insecure patterns. The related solutions also often ignore some important information such as the code structure, semantics and so on, leading to high false positives. This research project plans to incorporate vulnerability static detection with natural language processing, treating the target system as a language system, modeling it with the n-gram language model, and then computing the probabilities of the token sequences. This research will explore the automated techniques of extracting detection rules and develop corresponding detection methods. It analyzes the dependency and semantic correlation among the code elements and also recognizes the intra- and inter-procedural equivalent implementations so as to optimize the model, improve its representation capability, increase the accuracy of rule extraction and decrease the false positives and false negatives of vulnerability detection. Based on that, the research develops the detection methods associated with the rule extraction and implements the prototype system, constructing a complete tool chain of vulnerability static detection to overcome the obstacles of vulnerability static detection.

安全漏洞静态检测技术是国际上的热点研究领域，目前存在的主要问题是：传统的漏洞静态检测技术依赖于人工提取规则，效率问题突出，且与具体目标系统密切相关，使得规则提取成为了漏洞静态检测技术发展的瓶颈；基于数据挖掘技术的规则提取则依赖于目标系统中安全编程模式对于非安全编程模式的数量优势，且经常忽略代码内部结构语义等信息，具有较高误报率。本项目拟在漏洞静态检测领域引入自然语言处理技术，视目标代码为语言系统，利用n-gram语言模型建模，计算代码元素序列的概率，探索具有相当自动化程度的规则提取方法及相应的漏洞检测技术。将分析代码元素之间的依赖与语义关联，并通过过程内与过程间等效实现的识别完善模型，增强模型表示能力，提升规则提取的准确率，降低漏洞检测的误报与漏报。在此基础上，发展与规则提取相配合的检测技术，研发相应的原型系统，形成完整的漏洞静态检测工具链，以期有效克服漏洞静态检测技术当前存在的发展障碍。

项目摘要

安全漏洞静态检测技术是国际上的热点研究领域，目前存在的主要问题是：传统的漏洞静态检测技术依赖于人工提取规则，效率问题突出，且与具体目标系统密切相关，使得规则提取成为了漏洞静态检测技术发展的瓶颈；基于数据挖掘技术的规则提取则依赖于目标系统中安全编程模式对于非安全编程模式的数量优势，且经常忽略代码内部结构语义等信息，具有较高误报率。本项目引入基于深度学习的自然语言处理技术（词嵌入）来表征代码，并利用类比推理等技术自动识别代码中的敏感操作，以此为基础开展漏洞检测。此外，我们开展了利用基于深度图嵌入技术进行代码表征的探索，以尽可能利用代码的语义信息和结构信息，降低漏洞检测的误报、提升准确性。此外，受项目支持，我们也在其他研究领域，如Android安全性分析和基于数据挖掘的编码规则识别等方面开展了研究。我们设计实现了相应的方法和原型工具，在真实的软件系统（如Linux内核、OpenSSL等）中进行了测试，发现了数十个已被软件开发人员确认的未知缺陷。相关研究成果已发表7篇会议/期刊论文，含4篇CCF A类论文；部分研究成果作为重要组成部分获得了计算机学会2021年度的自然科学奖一等奖。

项目成果

DOI：{{i.doi}}

发表时间：{{i.publish_year}}

暂无此项成果

数据更新时间：2023-05-31

其他相关文献

DOI：10.16383/j.aas.c180673

发表时间：2021

DOI：10.3969/j.issn.1000-4440.2021.03.031

发表时间：2021

DOI：10. 11832 /j. issn. 1000-4858. 2015. 06. 004

发表时间：2015

DOI：CNKI:SUN:YGXB.0.2018-01-012

发表时间：2018

DOI：

发表时间：2022

黄建军的其他基金

批准号：60572102

批准年份：2005

资助金额：23.00

项目类别：面上项目

批准号：30300013

批准年份：2003

资助金额：20.00

项目类别：青年科学基金项目

批准号：60172066

批准年份：2001

资助金额：18.00

项目类别：面上项目

相似国自然基金

基于数据挖掘的安全漏洞静态检测方法研究

批准号：60873213

批准年份：2008

负责人：梁彬

学科分类：F0205

资助金额：30.00

项目类别：面上项目

基于自然语言处理技术的蛋白质相互作用预测方法研究

批准号：60673019

批准年份：2006

负责人：林磊

学科分类：F0214

资助金额：26.00

项目类别：面上项目

基于自然语言处理语义分析技术的蛋白质远同源性检测和折叠识别

批准号：61672184

批准年份：2016

负责人：刘滨

学科分类：F0213

资助金额：62.00

项目类别：面上项目

自然语言处理中基于矩阵的结构化学习研究

批准号：61402175

批准年份：2014

负责人：吴苑斌

学科分类：F0211

资助金额：26.00

项目类别：青年科学基金项目

基于自然语言处理的安全漏洞静态检测方法研究

{{i.achievement_title}}

暂无此项成果

其他相关文献

二维FM系统的同时故障检测与控制

黄曲霉毒素B1检测与脱毒方法最新研究进展

单狭缝节流径向静压气体轴承的静态特性研究

GF-4序列图像的云自动检测

融合字符串特征的维吾尔语形态切分

黄建军的其他基金

基于认知信息论的遥感影像识别

噬菌体基因敲除与噬菌体基因功能的研究

数字城市三维影像图关键地物智能识别技术研究

相似国自然基金