The main object of speech enhancement is to suppress the noise component in the noisy speech while keeping the speech component undistorted. It can be widely used in many applications such as the speech recognition system and the telecommunication system. In recent years, the multi-channel speech enhancement, which utilizes two or more microphones, has attracted much attention. By exploiting the spatial information, theoretically, the multi-channel methods can usually achieve better performance compared with the single-channel ones. However, some problems still exist. For the beamforming and Generalized Sidelobe Canceller, they need the prior knowledge of the direction of arrival (DOA) of the speaker, while in practice, the DOA is always unknown, and estimating the DOA is also a difficult task. Although the multichannel wiener filter avoids the DOA estimation problem, it can only leads to the speech distortion in theory, and the performance relies on the noise estimation. In this research, based on the intelligent perception of the acoustic environment, we study optimally modeling of the multichannel speech enhancement problem with applications. On one hand, we do not need any prior knowledge of the acoustic environment; on the other hand, we can reduce speech distortion caused by noise reduction. The main contents include: noise robust acoustic environmental knowledge estimation, confidence measure of the acoustic environmental knowledge, optimally modeling of the multichannel speech enhancement, integration of multiple speech enhancement results based on time-frequency speech property, and experimental verification platform for speech enhancement and its applications. Our research helps to improve the practicability of speech enhancement techniques, and has high value for the society and economy.
语音增强旨在降低背景噪声且保持语音不失真,它对语音识别、语音通信等系统具有重要价值。近些年来,由于性能上的优势,采用多个麦克风的多通道语音增强方法成为研究的热点。现有的多通道方法仍存在若干问题:波束形成、广义旁瓣消除通常需要语音声源方位已知,而实际中难以预先得到声源方位,且噪声环境下的声源定位尤其困难;多通道维纳滤波等虽然无需预知声源方位,但理论上存在语音畸变,且对噪声估计依赖性较强。本课题通过对声学环境智能感知,结合信号与统计方法,研究多通道语音增强问题中的优化建模方法及应用,使得算法既无需声源方位等环境先验知识,又降低了噪声消除带来的语音畸变,提高了算法的实用性。研究内容包括:噪声鲁棒的声学环境知识估计、声学环境知识置信度判别、多通道语音增强优化建模方法、基于语音时频特性的语音增强后融合方法、语音增强及其应用的实验验证。本研究有助于提升语音增强的实用性,具有较高的社会和经济价值。
基于多通道空间降噪是远场语音识别的核心技术之一,在声控智能家居、机器人对话、远场视频会议系统应用广泛,本项研究具有重要学术意义和应用价值。.经过四年系统深入探索,本项研究在复杂场景环境的感知建模、综合目标的优化方法及噪音鲁棒性语音识别的应用等三个主要方面取得了成效显著的研究进展,完成了研究目标预期的各项工作。.在复杂场景环境的感知建模方面,提出一种基于MCLP的在线混响算法,有效改善降混响性能;对于点干扰源噪声的感知,提出基于分布统计构建点干扰的空间协方差矩阵的方法,在点干扰的数量和方向难以准确估计的场景中所提算法优于其它的后滤波算法;提出一种基于DNN的目标说话人定位方法、和一种方向感知的多通道声学模型建模方法均取得优秀的目标声源定位效果,并使最终的目标语音增强效果优于现有比较算法的性能。.在运用不同准则机理目标优化建模方面,提出一种基于时序深度层叠网络的语音分离目标优化方法,有效对语音信号中的时序相关性进行建模,获得了更好的分离质量和泛化性能;提出一种两阶段多目标联合学习的方法,有效运用语音频谱的时空结构、相关性和互补性来优化语音分离目标,提升了语音分离性能;提出一种语音自回归与循环网络的联合优化的声源目标分离方法,实现了语音自回归与分离的联合建模和优化,在声乐分离中取得了极具竞争力的性能。.在基于语音增强语音识别噪声鲁棒性声学建模应用方面,提出一种基于深度对抗训练的鲁棒性声学建模方法,有效减小了噪声环境语音数据和真实训练数据的分布差异,提升了声学模型的鲁棒性;提出一种基于联合对抗增强训练的鲁棒性端到端声学建模方法,将前端语音增强和后端语音识别有机地统一成一体框架,提高了系统的噪声鲁棒性。.本项目取得了一系列丰厚的研究成果和技术储备,先后在IEEE Transaction on Audio, Speech, and Language Processing等本领域高水平Sci国际期刊论文发表6篇,本领域国际学术会议IEEE International Conference on Acoustics, Speech, and Signal Processing 和The Annual Conference of the International Speech Communication Association等论文15篇,培养毕业博士生5名,申请授权发明专利5项。
{{i.achievement_title}}
数据更新时间:2023-05-31
玉米叶向值的全基因组关联分析
监管的非对称性、盈余管理模式选择与证监会执法效率?
低轨卫星通信信道分配策略
宁南山区植被恢复模式对土壤主要酶活性、微生物多样性及土壤养分的影响
针灸治疗胃食管反流病的研究进展
基于感知信息的语音增强及客观质量评估
基于非线性语音谱分析的单通道语音增强研究
持续极强噪声环境下的语音增强方法研究
基于结构建模的语音理解及应用研究