How to help the robots better understand their working environments is one of the most challenging worldwide research topics. For the mobile robot systems in complex scenes, the robots are assumed to have similar cognition ability as human beings. However, existing vision sensor-based scene understanding entails the following challenges: 1) The collection of image databases: how to avoid that the pictures captured by the robots contain no objects? 2) Feature representation and learning: scene understanding is a high-lever vision task with low-level vision as its basis. In feature representation, how to effectively fuse the features from different sources in order to describe the objects accurately and reduce the burden of manually designed features? 3) Dimensionality reduction: robots frequently find it difficult to recognize objects and successfully complete assigned tasks in challenging scenarios, e.g., scenes with a significant amount of clutter. How can we obtain more robust features and help the robots understand scenes in real-time? Based on computer vision techniques, this project aims to construct a deep learning-based scene understanding system by: i) collecting and analyzing the images taken from Omni-directional vision sensor and Microsoft Kinect; ii) extracting the biological-inspired gist features and saliency features and conducting deep learning to learn effective fused features; and iii) designing a new manifold learning algorithm to reduce the dimensionality of feature vectors to achieve the adaptability and real-time performance in scene understanding. The system improves the ability of robots by fully utilizing the information encoded in visual inputs for scene understanding. It achieves the effectiveness, self-adaption, real-time performance, and is hence helpful for the widespread deployment of navigation systems in robot vision.
如何使机器人更好地理解其所在工作环境,是长久以来国内外学者密切关注并积极探讨的具有挑战性的研究课题之一。对于工作在复杂场景中的移动机器人系统,具有与人类相类似的环境认知能力是其能够自主运行的前提条件。然而,基于视觉传感器的场景理解常面临如下难点:1)图像的采集:如何避免机器人获取的图像中不包含目标物体?2)目标的特征表达和学习:场景理解作为高层视觉任务,其基础是底层视觉。如何有效融合多源特征对目标进行准确描述并减少人工设计特征的工作量?3)特征降维:如何获得更鲁棒的特征,使机器人能够实时理解环境?本项目从计算机视觉出发,拟建立一个基于深度学习的上下镜理解系统。通过采集及分析全方位视觉传感器和Kinect获取的图像,结合生物启发性的特征提取,用深度学习的方法进行特征学习,并设计一种新型流形学习方法对特征进行降维,实现场景理解的自适应性和实时性,为机器人视觉导航系统提供重要的技术支撑。
本项目着重研究了基于特征提取和融合的复杂场景中一系列重要的关键科学技术问题。首先,本项目研究了视觉跟踪中前景目标物体的检测问题。在传统的帧间差分和三帧差分前景分割方法中,大多数算法都会出现鬼影或者空洞,分割效果不尽如人意。为此,本项目提出一种改进的三帧差分算法和一种结合背景减除和帧间差分法的前景检测方法,还首次将经典的感知哈希应用于运动目标跟踪。并进一步提出两种改进的感知哈希算法,妥善地解决了场景理解实时性的难点。其次,本项目提出一个遗留物快速检测与识别的方法,融合了基于双向背景建模、均值漂移跟踪、基于像素区域信息的前景检测等算法。分析了在具有噪声和遮挡的复杂场景中,遗留物与像素级行人之间的关系,再通过结合矩不变和主成分分析来识别摄像机从不同方向和位置观察到的遗留物品。再次,本项目研究了基于多源数据融合的手势识别方法。区别于传统的彩色图像手势识别,本项目有效地融合了彩色信息,深度信息,和骨骼信息,通过特征提取和融合对手势进行准确分割。并构造双通道的卷积神经网络以减少人工设计特征的工作量,通过统计实验验证的手段证明了新方法较传统方法在手势识别中的优越性,有助于提高手势识别在人机交互中的有效性和实时性。此外,本项目还创建了若干个图像和视频数据库。研究团队在本项目相关研究领域中已接收和发表了一系列高水平的国际期刊和会议论文,完成了项目申请书中规定的所有研究内容。
{{i.achievement_title}}
数据更新时间:2023-05-31
基于SSVEP 直接脑控机器人方向和速度研究
基于公众情感倾向的主题公园评价研究——以哈尔滨市伏尔加庄园为例
基于协同表示的图嵌入鉴别分析在人脸识别中的应用
一种改进的多目标正余弦优化算法
面向工件表面缺陷的无监督域适应方法
融合先验建模和深度学习的自然场景视觉理解研究
基于结构化深度学习的场景理解
基于深度学习的装配场景理解及装配诱导、监测研究
基于场景语意理解和深度学习特征表述的视频行为分析研究