With the decrease of speech coding rate, the traditional speech coding model and methods can not guarantee high quality for speech coding. Especially, there are many kinds of background noises and interrupt signals in the battle field and the bandwidth resource is very limited, which results in many estimating and quantizing errors for speech coding parameters, such as linear prediction coefficients and pitch, under the traditional source-filter speech coding paradigm, which leads to severe degradation of speech quality. Recent years witnessed many breakthroughs for speech recognition and speech enhancement by using deep learning technology, however, it is rarely exploited for noise robust speech coding task. Therefore, this study tries to research the difficult problem of anti-noise low bit rate speech coding by exploiting the human auditory characteristics under the deep learning paradigm. We will study the new anti-noise speech coding model by using the stacked denoising auto-encoder model, propose new robust extracting and quantizing methods for speech coding parameters, constitute a new scheme for anti-noise speech coding at low bit rate, explore a new method for the fusion of speech coding and speech denoising. This study benefits for the research of the mechanism of speech production and perception of humans. It also benefits for solving some difficult problems in the field of satellite speech communication and short-wave speech communication, which is of significance and practical values.
随着编码速率的降低,传统的语音编码模型和方法难以实现高质量语音编码。特别是战场环境下,信道资源十分有限,各种嘈杂的背景噪声大量存在且干扰密集、繁杂而又多变,这使得传统声源—滤波器语音编码框架下线性预测系数、基音周期等编码参数的提取和量化过程都会产生较大偏差,并导致编码语音质量严重下降。近年来,深度学习在语音识别、语音增强等领域取得了许多突破性进展,然而在抗噪语音编码领域的应用研究却极少涉及。为此,本项目针对噪声环境下高质量低速率语音压缩编码难题,结合人耳的听觉感知特性,在深度学习的框架下应用堆叠降噪自编码机建立抗噪低速率语音编码新模型、提出语音编码参数鲁棒提取和高效量化新方法,构建适用于低速率语音传输的抗噪语音编码新方案,探索出语音降噪和语音编码融合的新途径。项目研究有助于深化认识语音信号的产生和感知机理,有助于解决军用卫星、短波等语音通信系统面临的现实难题,具有重要的理论意义和实用价值。
本课题面向噪声环境下语音通信场景,应用深度学习理论方法,突破传统“声源—滤波器”语音编码框架,研究涵盖“编码模型构建、编码参数提取、编码参数量化、编码语音重构”等多个方面的低速率语音编码理论和应对背景噪声的语音增强技术。首先,提出了一种“数据驱动”模式下的新型语音编码模型即Deep Vocoder模型;其次,针对Deep Vocoder模型中的语音编码参数高效量化问题,提出了一种基于分析合成策略的编码参数高效量化方法即AbS VQ方法,有效提升了参数量化质量;再次,针对实际语音通信系统需求,在Deep Vocoder模型和AbS VQ基础上,通过合理设置语音分析帧长、深度神经网络结构、编码参数的量化比特分配方案等,提出了一种匹配实际语音通信系统的新型低速率语音编码新方案,编码速率涵盖600~2400 bit/s,编码语音质量与MELPe算法相比具有优势;第四,针对16kHz宽带语音压缩编码问题,基于WaveGlow深度生成模型改进MFCC声码器,通过对MFCCs的高效量化仿真实现了1000~2000 bit/s低速率语音编码;最后,针对噪声问题,提出了深度学习与稀疏低秩、非负矩阵分解等相互融合的语音增强方法,为噪声环境下语音编码提供了有效的前端处理技术。通过项目研究,课题组在基于深度学习的新型语音编码模型框架、编码参数高效量化技术、编码语音重构方法以及语音增强新技术等方面均取得了一些成果,为深度神经网络在语音编码领域中的应用提供了理论方法,为噪声环境下低速率语音编码技术提供了一种新的实现途径。
{{i.achievement_title}}
数据更新时间:2023-05-31
基于 Kronecker 压缩感知的宽带 MIMO 雷达高分辨三维成像
基于LASSO-SVMR模型城市生活需水量的预测
基于SSVEP 直接脑控机器人方向和速度研究
基于公众情感倾向的主题公园评价研究——以哈尔滨市伏尔加庄园为例
基于分形维数和支持向量机的串联电弧故障诊断方法
噪声环境下基于深度学习的图像自动标注方法研究
高质量甚低比特-变速率语音编码算法研究
基于深度学习的单通道语音混响消除技术研究
基于深度学习的蒙古语语音问答技术研究