With the development and advancement of technology, the intelligent conference transfer and recording system that can automatically convert the conference records from voice to text will greatly enhance the efficiency of the meeting, reduce the manpower, and benefit the conclusion traceability of the meeting. It has broad application prospects. In order to ensure the sufficient accuracy of subsequent transliteration in the back end, it is a critical issue to involve the speech enhancement process in the front end of the system. Based on the intelligent conference transfer and recording system, this research project will start from the acoustic scene in the conference room environment and microphone array model, and will analyze and study the various factors that influence the speech quality and speech recognition rate. This project will conduct targeted research on some important and difficult speech enhancement problems including acoustic interferer cancelation and noise reduction, speech dereverberation and speech separation. Utilizing the theories in related area such as higher order tensor filtering, robust broadband beamforming, multi-channel linear predictive, multi-feature and discriminative dictionary learning, and deep learning, this project aims at obtaining the effective solutions for interferer and noise cancellation, speech dereverberation, and multi-speaker speech separation tasks as well as providing necessary theoretical basis and new solutions for the development of intelligent conference transfer and recording system.
随着社会的发展和科技的进步,能够自动实现从语音到文本进行会议记录的智能会议转写记录系统可以有效地提升会议效率,降低人力投入,且有利于会议结论可追溯性,有着广泛的应用前景。为了保证后端文本转写记录的精确度,前端的语音增强处理是一个至关重要的问题。本研究项目以智能语音会议转写记录系统为研究背景,从会议室环境下的声学场景和麦克风阵列模型入手,分析影响语音听觉质量和语音识别率的各种因素,对包括干扰和噪声消除、语音解混响和语音分离在内的一些语音增强中的重点难点问题,利用诸如高阶张量滤波、稳健宽带波束形成、多通道线性预测、区分性及多特征字典学习和深度学习等理论,研究干扰和噪声消除、解混响和多说话人语音分离的有效方法,为智能语音会议记录系统的发展提供必要的理论依据和新的解决思路。
在智能会议转写系统场景中,影响最终识别率和转写性能的因素是多方面的。当今的后端文本转写记录算法已达到很高的理论精确度,但在实际落地时效果欠佳。这是因为现实环境中各种干扰与噪声的非理想因素导致后端算法性能下降,所以前端的语音增强处理是一个至关重要的问题。本研究项目以智能语音会议转写记录系统为研究背景,从干扰和噪声消除、语音解混响和语音分离方面,使用高阶张量滤波、稳健宽带波束形成、字典学习、深度学习等多种技术进行算法研究,提升前端语音听觉质量和可懂读,以提高系统后端识别率。同时,本项目基于仿真数据和真实带噪语音信号进行算法的实验验证,证明了项目中所提算法的有效性和优越性,发表了学术论文30篇,已接受待发表1篇(其中SCI收录了18篇)。这些研究成果中的算法都提高了在各自任务中的性能,从而提高了语音增强的效果,具有十分重要的研究意义和应用价值。
{{i.achievement_title}}
数据更新时间:2023-05-31
基于一维TiO2纳米管阵列薄膜的β伏特效应研究
物联网中区块链技术的应用与挑战
人工智能技术在矿工不安全行为识别中的融合应用
混采地震数据高效高精度分离处理方法研究进展
异质环境中西尼罗河病毒稳态问题解的存在唯一性
基于麦克风阵列的语音增强和定位方法研究
基于麦克风阵列的多信道语音增强技术的研究
虚拟大麦克风阵列的语音增强技术研究
基于声音-视觉声源定位的麦克风阵列语音增强技术研究