基于视觉语义推理与上下文约束建模的场景理解方法研究

基本信息

批准号：61272218

项目类别：面上项目

资助金额：80.00

负责人：路通

学科分类：

依托单位：南京大学

批准年份：2012

结题年份：2016

起止时间：2013-01-01 - 2016-12-31

项目状态：已结题

项目参与者：苏丰,杨若瑜,巫义锐,袁泽寰,马小林,尹维冲,王昊,徐飞明,邢润

关键词：

场景理解上下文建模视觉语义推理

结项摘要

Vision-based scene understanding is one of the hot spots and challenges in the next decade, with the targets of automaticlaly and effectively analyzing, recognizing and further representing the contents directly from scene images or videos. However, few research has been reported recently and its theories or algorithms are still in the early exploration stage. Our experiments show that the following three approaches of visual semantics reseaning, scene object recognition and behavior detection can provide a novel framework for natural scene understanding.We first label visual regions in a scene image through the Max-Margin based topic model, and then infer indrect semantics by optimizing the constrains of latent aspects, aiming at utilizing the relations among lables and visual features. Next, considering the diversity and variability properties of scene objects, we model multi-level scene contexts using the Conditional Random Field (CRF) techniques and describe multi-view object constraints through a novel undirected graph representation. As a result, scene objects can be recognized in a more accurate and robust way. Finally, we use three-dimentional Gaussian distributions to describe local motion patterns inside depth-integrated scene videos, and model spatio-temporal contexts through the Markov Random Field (MRF) to detect specific motions, especially from crowded scenes. Our project will provide a new framework and novel techniques for scene understanding research.

基于视觉的自然场景理解是未来若干年内的研究热点和重要挑战之一，其目标是对自然场景图像及视频的内容作出有效分析、认知与表达，目前相关理论和算法正处于初期探索阶段。我们的预实验研究成果表明，从场景视觉语义推理、场景目标识别和场景行为模式检测三个环节展开研究，有助于构建自然场景理解的创新机制。本项目采用基于最大间隔训练的主题模型来标注场景中可见区域的语义，进而利用标注与视觉特征间的关系，通过对含隐变量的约束优化求解来推导间接场景语义。针对自然场景图像中目标的多样性和可变性问题，通过条件随机场构建自然场景的多层次上下文，并利用无向图刻画目标的多视角关联表示，提高目标识别算法的准确性和鲁棒性。最后，通过三维高斯分布来描述场景视频中融合深度信息的局部运动模式，并采用马尔科夫随机场模型刻画局部运动模式间时空上下文，以探索拥挤场景中行为模式检测的新方法。本项目的研究将为自然场景理解探索提供新的思路和技术。

项目摘要

本项目着重围绕场景视觉语义推理、场景目标识别和场景行为模式检测三个环节展开了深入研究。我们首先探索了面向视觉语义理解的场景文本的检测与识别机制，针对场景文本表示方式多变、成像条件不确定、所在环境复杂等挑战性难题的研究取得重要进展，为视觉数据的高层次语义理解开辟了新途径。本项目进一步探索了融合多层次上下文和多视角关联约束的场景构成分析机制，从多视角球面建模、可视化场景关联网刻画、场景语义随机游走设计和上下文敏感的场景主题建模等角度，系统分析了场景的构成机制，为场景内容理解提供了新手段与新方法。最后，本项目开展了场景中行为模式分析的研究，从场景深度数据采集、场景超时空体建模、场景运动信息的马尔科夫随机场建模、结合场景运动分析与粒子滤波的目标跟踪等方面进行了系统探索。在上述研究基础上，本项目还设计了融合场景行为分析的创新应用。我们在《IEEE Transactions on Image Processing》(2篇)、《IEEE Transactions on Multimedia》（2篇）、《Pattern Recognition》（3篇）、《Computer Vision and Image Understanding》、《Graphical Models》、《Neurocomputing》、《Computer-Aided Geometric Design》、《Multimedia Tools and Applications》（2篇）、《Multimedia Systems》、《Expert Systems with Applications》（2篇）、《International Journal on Document Analysis and Recognition》、《IET Computer Vision》、《Applied Intelligence》等重要国际刊物发表论文19篇，在ECCV、ICME、ICPR、ICIP、ICDAR、MMM、ICFHR、DAS等重要国际会议发表论文28篇。应邀撰写了Springer London出版社的《Video Text Detection》专著1部、Springer New York出版社的《Handbook of Document Image Processing and Recognition》专著1部（章节）。

项目成果

DOI：{{i.doi}}

发表时间：{{i.publish_year}}

暂无此项成果

数据更新时间：2023-05-31

其他相关文献

DOI：10.12005/orms.2019.0029

发表时间：2019

DOI：10.3969/j.issn.1004-132x.2022.04.001

发表时间：2022

DOI：10.12305/j.issn.1001-506x.2022.03.19

发表时间：2022

DOI：10.3788/AOS201939.0615002

发表时间：2019

DOI：10.3901/JME.2018.19.027

发表时间：2018

路通的其他基金

批准号：61672273

批准年份：2016

资助金额：59.00

项目类别：面上项目

批准号：60603086

批准年份：2006

资助金额：25.00

项目类别：青年科学基金项目

相似国自然基金

基于视觉和语义的室内场景理解与实时建模

批准号：61632006

批准年份：2016

负责人：尹宝才

学科分类：F0209

资助金额：265.00

项目类别：重点项目

基于多任务概率视觉语义模型的图像场景理解

批准号：61301192

批准年份：2013

负责人：魏巍

学科分类：F0116

资助金额：25.00

项目类别：青年科学基金项目

视频中场景理解的因果模型与推理方法

批准号：61876020

批准年份：2018

负责人：梁玮

学科分类：F0604

资助金额：16.00

项目类别：面上项目

语义关联的室内场景对象建模及功能理解

批准号：61772049

批准年份：2017

负责人：孔德慧

学科分类：F0209

资助金额：63.00

项目类别：面上项目

基于视觉语义推理与上下文约束建模的场景理解方法研究

{{i.achievement_title}}

暂无此项成果

其他相关文献

基于直觉模糊二元语义交互式群决策的技术创新项目选择

船用低速机关键摩擦副建模分析与摩擦力无线测量验证

空中交通延误预测研究综述

基于离散Morse理论的散乱点云特征提取

平面并联机构正运动学分析的几何建模和免消元计算

路通的其他基金

融合先验建模和深度学习的自然场景视觉理解研究

三维CAD模型的表示与检索机制研究

相似国自然基金