Summarizing photos and videos has been a hot topic at the frontier of multimedia research community. However, the problem enters into a bottleneck due to the lack of an effective objective evaluation mechanism. In addition, automatically generating a summary that is consistent with human understanding still remains unsolved. Therefore, this project will take sparse representation and statistical learning theory as its foundation, and study the joint summarization method of large-scale collections of photos and videos. Specifically, this project will address the following research issues. First, to improve the summarization accuracy, we will build a sparse representation model for a single unstructured video, and study the scalable video skim generation method with the sparse reconstruction coefficients as guidance. Secondly, based on the single video skimming model, we will exploit the complementary nature between photos and videos, and propose the optimization-based skimming method for the collections of photos and videos, being able to improve the significance of the summarization result. Thirdly, we will establish a subjective evaluation dataset for video skimming, analyze the relationship between the statistics of high-dimensional data after sparse representation and the summary quality, and derive an objective, effective, and robust evaluation method based on statistical learning. The objective evaluation method will also act as a feedback to refine the skimming strategy, facilitating the consistency between the summary and human understanding. This project is expected to achieve theoretical contributions and technical breakthroughs, promoting the development of video summarization.
实现照片与视频数据的简洁性表达是国际多媒体领域前沿的研究热点,然而由于缺乏有效的摘要结果客观评价机制从而进入发展瓶颈,且与用户理解一致的自动摘要方法尚未妥善解决。本申请项目拟以稀疏表示和统计学习为理论基础,研究大规模照片与视频集合的联合摘要方法。研究内容包括:建立单个非结构化视频的稀疏表示模型,并根据稀疏重构系数研究长度可扩展的缩略视频生成方法,提高摘要准确率;以单一视频的缩略模型为基础,探索大规模照片与视频数据的互补特性,制定影像集缩略视频的优化构建方法,提高摘要结果的显著性;建立视频缩略结果的主观评价数据集,根据稀疏表达后的高维数据分析各阶统计量与摘要结果质量的关系,基于统计学习实现摘要结果的客观、有效及鲁棒评价,并反馈指导完善缩略视频构建策略,促使摘要结果与用户理解保持一致。本申请项目可望取得理论创新与技术突破,促进视频摘要的广泛应用。
本项目以稀疏学习、统计学习和深度学习为理论基础,旨在实现影像数据的简洁性表达。在探索冗余数据的本质稀疏性方面,提出了基于群组稀疏性的摘要方法,以及基于稀疏时变图研究了视频片段之间的转化关系,从而根据稀疏性和转化关系来构建缩略视频。进一步地,结合最新的深度学习理论,研究视频集的特征表达,利用三维卷积网络、残差网络、注意力机制来建立缩略模型,显著提高了摘要准确率,平均F-score达到96.1%。对于运动和场景复杂的视频,将视频中的显著性和对象性综合考虑,相邻时域窗口内进行正向和反向传播来完善预测,得到时空一致的显著性图用以辅助内容摘要。质量评价方面,为了探索人眼对视频质量的认知机理,从人眼特性模拟出发,所提出恰可感知模型预测值和观测值的PLCC相关性高达0.99,并尝试根据自然图像的统计特性获取质量评价模型。
{{i.achievement_title}}
数据更新时间:2023-05-31
基于公众情感倾向的主题公园评价研究——以哈尔滨市伏尔加庄园为例
基于细粒度词表示的命名实体识别研究
肉苁蓉种子质量评价及药材初加工研究
基于协同表示的图嵌入鉴别分析在人脸识别中的应用
中外学术论文与期刊的宏观差距分析及改进建议
大规模异质信息网络摘要和摘要可解释性研究
大规模流数据的在线摘要方法研究
基于多模型联合学习的视频摘要方法研究
多相机监控系统联合视频摘要与浏览方法研究