Content-oriented natural scene understanding is one of the hot spots and key challenges in computer vision and pattern recognition community, which is still in the early stage for exploration. Based on our previous research, we further explore the following three key steps consisting of scene prior modeling, scene object recognition, and scene activity detection by using the recent technologies, aiming at providing content reasoning processes from different aspects and hierarchies. We first design general scene semantic measuring metrics by combining deep learning and manifold learning techniques for extracting scene category prior. For characterizing location, shape, and spatial layout priors of each given scene, we further adopt sum-product network to model multi-scale unary potentials. Next, we construct an adaptive deep learning framework based on semantic gain modeling to improve the efficiency of deep learning, and propose a new multi-modal deep learning framework for context reasoning to improve the accuracy of scene object recognition. This helps solve the difficulties brought by the diversity of scene compositions and the variations of their visual appearances. Finally, we utilize video features extracted by deep neural networks and high-order constraint modeling among video frames to detect specific activities from large-scale scene videos, for further improving natural scene understanding research by simultaneously considering humans, computers and objects occur in the same scene. Our project will provide a new approach for natural scene understanding research.
面向内容的自然场景理解研究是计算机视觉与模式识别领域的前沿热点和重要挑战,迄今相关理论和方法仍不成熟。我们在前期研究基础上,进一步结合深度学习技术,从场景先验建模、场景物体识别和场景行为检测这三个关键环节展开研究,以便从不同角度、不同层次揭示自然场景内容的推理过程。本项目通过深层次神经网络和流形学习探索具有一定泛化能力的场景语义度量方法,进而利用和积网络对多尺度单元势能建模,以分别刻画场景类别先验和场景中位置、形状、布局等空间结构先验。针对场景构成多样性和场景中物体外观多变性问题,通过构建基于自适应语义增益和多模态上下文推理这一新的学习机制,进一步提高场景物体识别的性能和效率。最后,利用深度特征表达和帧间高阶约束建模,揭示大规模场景视频中特定行为的识别机制,以进一步完善人、机、物融合的自然场景理解研究。本项目的研究将为自然场景理解探索开辟新的途径。
面向内容的自然场景理解研究是计算机视觉与模式识别领域的前沿热点和重要挑战,迄今相关理论和方法仍不成熟。我们在前期研究基础上,进一步结合深度学习技术,从场景先验建模、场景语义提取和场景行为识别这三个关键环节展开研究,从不同角度、不同层次揭示自然场景内容的推理过程。本项目通过深度全连接条件随机场实现场景集合中共有物体的分割,进而提出新的深层神经网络通用框架,刻画了场景类别先验和场景中位置、形状、布局等空间结构先验。针对场景构成多样性和场景中物体外观多变性问题,提出渐进式扩展算法,对自然场景中关键文本语义进行检测和识别,进一步提高场景理解的性能和效率。最后,利用动态采样网络,揭示了大规模场景视频中特定行为的高效识别机制,以进一步完善人、机、物融合的自然场景理解研究。
{{i.achievement_title}}
数据更新时间:2023-05-31
基于分形L系统的水稻根系建模方法研究
基于公众情感倾向的主题公园评价研究——以哈尔滨市伏尔加庄园为例
基于协同表示的图嵌入鉴别分析在人脸识别中的应用
一种改进的多目标正余弦优化算法
面向工件表面缺陷的无监督域适应方法
基于深度学习的特征融合在移动机器人视觉中的场景理解及研究
融合自然语言处理的深度视觉理解关键技术研究
基于结构化深度学习的场景理解
基于视觉和语义的室内场景理解与实时建模