基于隐含知识挖掘与时间敏感的知识图谱补全关键技术研究

基本信息

批准号：61772040

项目类别：面上项目

资助金额：60.00

负责人：穗志方

学科分类：

依托单位：北京大学

批准年份：2017

结题年份：2021

起止时间：2018-01-01 - 2021-12-31

项目状态：已结题

项目参与者：葛涛,姜廷松,沙磊,王柯翔,夏乔林,修瑨,王亮

关键词：

知识库构建文本挖掘知识获取实体关系抽取知识图谱补全

结项摘要

With the booming development of the Internet intelligence, knowledge graph (KG), as a fundamental knowledge infrastructure, has become more and more useful for many NLP related applications such as Content Understanding, Semantic Search, Question Answering and Machine Translation etc. Although KGs are large in size, they are far from complete. In order to solve the problem of the knowledge graph incompleteness, this project will focus on the research of the knowledge graph completion technology. We propose to improve KG completion through inferring implicit knowledge and utilizing the temporal aspects of the facts. The former uses the embedding based method to mine the association rules based on distributed representation and aims to encode each object (entities and relations) in knowledge graphs into a continuous vector space. This kind of approach has shown strong feasibility and robustness. Furthermore, we use the Markov Logic Network (MLN) which is constructed using the inference rules to infer the correctness of the implicit facts. Each rule has a weight which should be trained by real world facts. Using this framework, we can probabilistically evaluate the implicit knowledge recognition to get more accurate prediction. The later presents a novel time-aware knowledge graph completion model that is able to predict links in a KG using both the existing facts and the temporal information of the facts. To incorporate the happening time of facts, we propose a time-aware KG embedding model using temporal order information among facts. To incorporate the valid time of facts, we propose a joint time-aware inference model based on Integer Linear Programming (ILP) using temporal consistency information as constraints. We further integrate two models to make full use of global temporal information. The achievements of the research will be expected to provide the key technology for the large scale knowledge engineering.

在互联网智能化时代，知识图谱是支撑内容理解、智能搜索、自动问答、机器翻译等应用的知识基础。本申请针对知识图谱不完备性问题，研究高效准确的知识图谱补全关键技术。通过隐含知识挖掘和时间维度拓展这两方面对知识图谱本身蕴含的信息进行深入挖掘和充分利用。前者通过基于嵌入式表示的关联规则挖掘方法，在低维语义空间中对于实体关系以及逻辑规则等知识元素进行隐式表示和学习，进而，基于马尔科夫逻辑网对隐含知识进行概率化赋值，以扩充知识图谱。后者将实体关系的关联建模拓展到时间维度，通过融合时序信息和持续时间的联合模型提升知识图谱补全任务的性能。其中，基于时序信息的嵌入式模型假设时间敏感的关系之间具有时序依赖性，其分布式表示可随时间演进而转换，进而将关系时序信息有效编码到知识表示的向量空间；基于持续时间的模型提炼多项时间约束条件，利用整数线性规划进行全局推理和预测。本研究将为大规模知识图谱构建提供关键技术储备。

项目摘要

在互联网智能化时代，知识图谱是支撑内容理解、智能搜索、自动问答、机器翻译等应用的知识基础。本项目针对知识图谱不完备性问题，研究高效准确的知识图谱补全关键技术。通过隐含知识挖掘和时间维度拓展这两方面对知识图谱本身蕴含的信息进行深入挖掘和充分利用。本项目的主要研究工作包括：基于时间信息的知识图谱补全技术、面向知识图谱补全的隐含知识挖掘技术、知识图谱构建核心工具、知识图谱构建相关评测。课题组按项目预订的计划进行，圆满完成了课题任务书预订的各项指标，在理论模型、关键技术、核心工具、规范与评测等方面取得了一系列研究成果。在理论与方法层面，提出了隐含知识挖掘与知识图谱补全的系列方法，包括时间已知的知识图谱嵌入式模型、融合时间信息的联合模型、基于归纳式表达新实体的知识图谱补全技术、基于知识图谱Schema的关系提取、基于依存桥和张量网络的事件提取等方法，在自然语言处理领域高水平学术会议（ACL，IJCAI，EMNLP，COLING等）发表学术论文17篇，授权专利1项、软件著作权3项。在工具与评测层面，以医学领域知识图谱构建为应用示范和验证，研发了知识图谱构建核心工具，包括：多视图、交互式可视化方法及系统、医学领域命名实体识别、关系提取核心工具等，在中国健康信息处理大会CHIP2020组织了中文医学文本命名实体识别和中文医学文本实体关系抽取2项评测任务。在人才培养方面，培养博士研究生4名，硕士研究生4名，其中刘天宇同学获中国中文信息学会优秀博士论文提名奖。项目负责人穗志方教授被评为北京市“智源学者”。本项目的研究成果将为大规模知识图谱构建及应用提供关键技术储备。

项目成果

DOI：{{i.doi}}

发表时间：{{i.publish_year}}

暂无此项成果

数据更新时间：2023-05-31

其他相关文献

DOI：10.11897/SP.J.1016.2018.00886

发表时间：2018

DOI：10.7507/1672-2531.202012076

发表时间：2021

DOI：

发表时间：2018

DOI：

发表时间：2017

DOI：10.1177/1721727X17739516

发表时间：2017

穗志方的其他基金

批准号：60503071

批准年份：2005

资助金额：23.00

项目类别：青年科学基金项目

批准号：61375074

批准年份：2013

资助金额：79.00

项目类别：面上项目

批准号：60873156

批准年份：2008

资助金额：32.00

项目类别：面上项目

批准号：61075067

批准年份：2010

资助金额：37.00

项目类别：面上项目

相似国自然基金

大规模WiFi轨迹隐含知识图谱挖掘研究

批准号：61502264

批准年份：2015

负责人：王鹏

学科分类：F0211

资助金额：20.00

项目类别：青年科学基金项目

基于多源异构数据的知识图谱补全及验证关键技术研究

批准号：61906035

批准年份：2019

负责人：徐波

学科分类：F0607

资助金额：24.00

项目类别：青年科学基金项目

语义Web知识库补全关键技术研究

批准号：61772079

批准年份：2017

负责人：王志春

学科分类：F0607

资助金额：15.00

项目类别：面上项目

基于溯因推理的知识图谱补全结果解释机制研究

批准号：61876204

批准年份：2018

负责人：杜剑峰

学科分类：F0607

资助金额：65.00

项目类别：面上项目

基于隐含知识挖掘与时间敏感的知识图谱补全关键技术研究

{{i.achievement_title}}

暂无此项成果

其他相关文献

WMTL-代数中的蕴涵滤子及其应用

口腔扁平苔藓研究热点前沿的可视化分析

相关系数SVD增强随机共振的单向阀故障诊断

区块链技术:从数据智能到知识自动化

Anti-inflammatory activity of a thermophilic serine protease inhibitor from extremophile Pyrobaculum neutrophilum

穗志方的其他基金

汉语动词子语类框架的自动获取技术研究

文本语言表达到概念关系的映射方法研究与资源建设

基于结构化学习的语义角色标注方法研究

基于Web的概念实例及其属性值提取方法研究

相似国自然基金