Speech separation aims to separate of target speech from the background noise, which is an important branch in the field of speech signal processing. Speech separation has wide practical applications, such as mobile communication, hearing aid design and robust automatic speech recognition. Traditional methods regard speech separation as a signal processing problem (such as spectral subtraction and beamforming), which is based on the understanding of sound signals. The algorithms rely on the assumptions statistical properties of noise signals and some known parameters which have to be tuned manually in practice. In the realistic environment, it is very difficult to satisfy the assumption and set up the proper parameter. Recently, speech separation is considered as a supervised learning problem. In particular, deep-learning based speech separation achieves remarkable performance and has gradually become a research hotspot. However, as supervised learning, these algorithms highly dependent on training data and usually require huge data to solve the problem of generalization. The research goal of this project is to combine the traditional signal processing and deep learning methods. On the one hand, traditional signal processing methods are used to improve the training efficiency and generalization of neural networks. On the other hand, deep neural networks with large-scale data training can improve the accuracy of parameter estimation and enhance the noise reduction performance of the traditional method.
语音分离研究从带噪语音中分离出目标人声,是语音信号处理领域的一个重要分支,对移动通讯、助听器和鲁棒性自动语音识别等实际应用有着巨大的研究价值。传统方法将语音分离看成是一个信号处理问题(如:谱减法、波束形成等),其解决思路建立在对声音信号理解的基础之上,算法依赖噪声信号统计特性的假设和参数估计。在现实噪音环境中,假设很难满足同时参数估计十分困难。最近,语音分离被视为一个监督性学习问题。尤其深度学习发展迅速,基于深度神经网络的语音分离算法表现突出,逐渐成为研究热点。然而,作为监督性学习,算法高度依赖训练数据,通常需要海量数据解决泛化性问题。本项目的研究目标是:将传统信号处理和深度学习方法相融合,一方面利用传统信号处理方法提高神经网络的训练效率和泛化性问题,另一方面借助深度神经网络的大规模数据训练解决传统信号处理对噪声统计特性假设的依赖,同时提高参数估计的精度,提升传统方法的降噪性。
近几年,基于深度学习的语音降噪研究进展迅速,而泛化性和资源占用高极大的问题,限制了技术的实际应用。本项目立足于将深度学习与传统信号处理结合。一方面利用深度学习强大的建模能力,另一方面利用了传统信号处理的知识。在单/多通语音增强、回声消除、混响消除以及个性化语音增强等方面提出了一系列新方法,提高了系统的泛化性,并在一定程度上实现了模型的小型化。项目组成员在IEEE Transaction on Audio, Speech and Language Processing、ICASSP、INTERSPEECH著名期刊和国际会议上发表了27篇学术论文。参与本项目的2名博士研究生获得了博士学位,7名硕士研究生获得了硕士学位。
{{i.achievement_title}}
数据更新时间:2023-05-31
基于公众情感倾向的主题公园评价研究——以哈尔滨市伏尔加庄园为例
基于协同表示的图嵌入鉴别分析在人脸识别中的应用
一种改进的多目标正余弦优化算法
面向工件表面缺陷的无监督域适应方法
采用深度学习的铣刀磨损状态预测模型
语音识别中的稀疏性深度学习
基于深度学习的汉藏双语语音合成的研究
基于深度学习的单通道语音混响消除技术研究
基于深度学习的数字语音被动取证新方法研究