异质噪声场景语音识别中的结构化深度学习研究

基本信息

批准号：61603252

项目类别：青年科学基金项目

资助金额：22.00

负责人：钱彦旻

学科分类：

依托单位：上海交通大学

批准年份：2016

结题年份：2019

起止时间：2017-01-01 - 2019-12-31

项目状态：已结题

项目参与者：谭天,王帅,刘奇,游永彬,毕梦霄,庄毅萌,项煦

关键词：

异质性数据深度神经网络鲁棒性语音识别结构化建模自适应技术

结项摘要

Noise robust speech recognition is one of the unresolved key technologies in the field of speech recognition. Focusing on the real "heterogeneous" noise data under the real scenario, the innovative research is conducted in this project. By focusing on solving the "mismatch between training and testing" problem in the "heterogeneous" data environment, this project will start from the noise modeling and acoustic modeling. The structured deep learning based methods are both developed on these two aspects, and the mechanisms of the human ear’s hearing and human’s environmental perception are also combined. The new theories and methods are explored to better utilize the “heterogeneous” noise data for robust speech recognition, compared to the previous methods, and this is a completely new research strategy. This research mainly includes these aspects shown as below: (1) Structured deep learning based noise modeling under the heterogeneous noise data, to explore the discrimination and relevance among the noise by the study of the noise itself. Mainly including the noise representation, classification and parameter estimation. (2) Structured deep learning based acoustic modeling under the heterogeneous data scenario, to deal with the “training and testing mismatch” problem in the heterogeneous data environment through the structured deep learning. Mainly including the structured deep modeling based speech denoising and robust feature extraction, structured environment-aware acoustic modeling and learning algorithm, noise adaption and joint optimization combining with a prediction-feedback mechanism. It is hoped that recognition accuracy and robustness of the system can be greatly improved through this study. This study has important theory significance and practical value.

项目摘要

复杂噪声场景下的鲁棒语音识别是语音识别领域尚未解决的关键技术之一。本课题着眼于真实噪声数据“异质性”所引起的“训练与测试失配”现象，从噪声模型的建模和声学模型的建模入手，均采用结构化的深度学习方法，并结合人类感知过程和人耳听觉机理，合理利用异质环境噪声数据，探索抗噪语音识别建模新方法和新理论，相比前人方法，这是一条新的研究思路。本课题主要研究内容包括：（1）异质数据下噪声模型的结构化深度学习，通过对环境和噪声本身的研究来探寻噪声之间的区分性和相关性，包括对噪声的表达，分类及参数估计。（2）异质数据下声学模型的结构化深度学习，通过结构化的深度模型来应对异质数据下面临的“训练与测试失配”问题：包括结构化的语音谱除噪和抗噪特征表示，结构化的环境感知声学建模与学习算法，噪声自适应及结合预测反馈机制的联合优化方法。. 在项目执行过程中，我们提出了若干种结构化的创新方法，均有效地改善了语音识别系统在噪声场景下的识别性能。具体包括如下创新方法：1）极深卷积神经网络及自适应方法；2）基于神经网络建模的环境因子分析与表示；3）基于多因子环境感知的抗噪鲁棒语音识别；4）基于未来因子的语言模型建模与预测能力提升；5）复杂异质数据下的鲁棒端点检测算法；6）基于排列不变性训练的多人说话混叠语音分离与识别；7）基于深度生成对抗网络的数据扩充和抗噪建模；8）基于端到端模型的多人说话混叠语音分离与识别。利用以上这些方法，在抗噪语音识别基准测试集合Aurora4上，我们取得了目前报道的最优性能。基于本课题研究，发表了一系列高水平文章，相关算法也在真实系统中应用上线。. 本项目的研究思路和研究成果，对指导深度学习更好地建模，有强有力的指导和借鉴意义。相关方法和思路可以扩展到智能语音的其他相关任务中去，研究具有重大理论意义和实际应用价值。

项目成果

DOI：{{i.doi}}

发表时间：{{i.publish_year}}

暂无此项成果

数据更新时间：2023-05-31

其他相关文献

DOI：10.7524 /j.issn.0254-6108.2017122903

发表时间：2018

DOI：10.7606/j.issn.1000-7601.2021.04.29

发表时间：2021

DOI：10.12202/j.0476-0301.2022178

发表时间：2022

DOI：

发表时间：

DOI：10.6041/j.issn.1000-1298.2022.07.022

发表时间：2022

钱彦旻的其他基金

相似国自然基金

语音识别中的稀疏性深度学习

批准号：61371136

批准年份：2013

负责人：王东

学科分类：F0117

资助金额：74.00

项目类别：面上项目

基于结构化深度学习的场景理解

批准号：61872364

批准年份：2018

负责人：卢汉清

学科分类：F0210

资助金额：63.00

项目类别：面上项目

听觉场景分析及其噪声环境下的语音识别

批准号：60272044

批准年份：2002

负责人：吴镇扬

学科分类：F0111

资助金额：20.00

项目类别：面上项目

面向语音表示及分离的结构化深度学习研究

批准号：61471394

批准年份：2014

负责人：张雄伟

学科分类：F0117

资助金额：80.00

项目类别：面上项目

异质噪声场景语音识别中的结构化深度学习研究

{{i.achievement_title}}

暂无此项成果

其他相关文献

珠江口生物中多氯萘、六氯丁二烯和五氯苯酚的含量水平和分布特征

向日葵种质资源苗期抗旱性鉴定及抗旱指标筛选

复杂系统科学研究进展

基于LS-SVM香梨可溶性糖的近红外光谱快速检测

基于改进LinkNet的寒旱区遥感图像河流识别方法

钱彦旻的其他基金

相似国自然基金