文章摘要
王玫,李江和,宋浠瑜,刘小娟.基于轻量级卷积门控循环神经网络的语音增强方法*[J].,2023,42(3):652-658
基于轻量级卷积门控循环神经网络的语音增强方法*
Speech enhancement method based on lightweight convolution gated recurrent neural network
投稿时间:2022-01-17  修订日期:2023-04-26
中文摘要:
      针对在基于深度学习语音增强的方法中因采用因果式的网络输入导致语音增强性能下降的问题,提出了一种基于轻量级卷积门控循环神经网络(LCGRU)的语音增强方法。门控循环神经网络能够建模语音信号的时间相关性,但是其全连接结构破坏了语音信号的时频结构特征,并且参数数量庞大,不利于网络的训练。对此,本文采用卷积核替代门控循环神经网络中的全连接结构,在对语音信号时间相关性建模的同时保留了语音信号的时频结构特征,同时降低了网络的参数数量。为充分利用先前帧的特征信息,该网络单元当前时刻的输入融合了上一时刻的输入与输出。针对网络训练过程中容易产生过拟合的问题,本文采用了线性门控机制来控制信息的传输,这缓解了网络训练过程中的过拟合问题,提高了网络的语音增强性能。实验结果表明,本文所提出的网络结构在增强后的语音感知质量(PESQ),语音短时客观可懂度(STOI),分段信噪比(SSNR)等指标上均优于传统的网络结构。
英文摘要:
      Aiming at the problem of speech enhancement performance degradation because of causal-input, a method based on lightweight convolution gated recurrent neural network (LCGRU) is proposed. Gated recurrent neural network can model the time correlation, but its full connection structure destroys the time-frequency structure of speech, and the parameters is huge, which is not conducive to training of the network. In this paper, the convolution kernel is used to replace the full connection structure. While modeling the time correlation of speech, the time-frequency structure are retained, and the network parameters is reduced. To make full use of the characteristic of the previous frames, the input of the network at the current time combines the input and output of the previous time. This paper uses the linear gating mechanism to control the transmission of information, which alleviates the over fitting problem of the network and improves the speech enhancement performance. The experimental results show that the network proposed has higher scores than the traditional networks in PESQ, STOI and SSNR.
DOI:10.11684/j.issn.1000-310X.2023.03.025
中文关键词: 卷积门控循环神经网络  固定时延  因果式语音增强  语音质量  语音可懂度
英文关键词: convolution gated recurrent neural network  fixed delay  causal speech enhancement  speech perceptual quality  speech objective intelligibility
基金项目:
作者单位E-mail
王玫 桂林理工大学 glwm@qq.com 
李江和 桂林理工大学 898569751@qq.com 
宋浠瑜* 桂林电子科技大学 认知无线电与信息处理省部共建教育部重点实验室 260626277@qq.com 
刘小娟 桂林理工大学 1719885214@qq.com 
摘要点击次数: 562
全文下载次数: 561
查看全文   查看/发表评论  下载PDF阅读器
关闭