Page 214 - 《应用声学》2023年第3期
P. 214

第 42 卷 第 3 期                                                                       Vol. 42, No. 3
             2023 年 5 月                          Journal of Applied Acoustics                      May, 2023

             ⋄ 研究报告 ⋄


              基于轻量级卷积门控循环神经网络的语声增强方法                                                                     ∗


                                      王 玫    1   李江和     1    宋浠瑜      2†  刘小娟      1


                                          (1 桂林理工大学信息科学与工程学院          桂林   541004)
                              (2 桂林电子科技大学     认知无线电与信息处理省部共建教育部重点实验室               桂林  541004)
                摘要:针对在基于深度学习语声增强方法中因采用因果式的网络输入导致语声增强性能下降的问题,提出了
                一种基于轻量级卷积门控循环神经网络的语声增强方法。门控循环神经网络能够建模语声信号的时间相关
                性,但是其全连接结构忽略了语声信号的时频结构特征,并且参数数量庞大,不利于网络的训练。对此,该文采
                用卷积核替代门控循环神经网络中的全连接结构,在对语声信号时间相关性建模的同时保留了语声信号的时
                频结构特征,同时降低了网络的参数数量。为充分利用先前帧的特征信息,该网络单元当前时刻的输入融合了
                上一时刻的输入与输出。针对网络训练过程中容易产生过拟合的问题,该文采用了线性门控机制来控制信息
                的传输,这缓解了网络训练过程中的过拟合问题,提高了网络的语声增强性能。实验结果表明,该文所提出的
                网络结构在增强后的语声感知质量、语声短时客观可懂度、分段信噪比等指标上均优于传统的网络结构。
                关键词:卷积门控循环神经网络;固定时延;因果式语声增强;语声质量;语声可懂度
                中图法分类号: TN912           文献标识码: A          文章编号: 1000-310X(2023)03-0652-07
                DOI: 10.11684/j.issn.1000-310X.2023.03.025

                    Speech enhancement method based on lightweight convolution gated

                                              recurrent neural network

                                 WANG Mei  1   LI Jianghe 1  SONG Xiyu  2  LIU Xiaojuan 1

                     (1 School of Information Science and Engineering, Guilin University of Technology, Guilin 541004, China)
                         (2 Provincial Ministry of Education Key Laboratory of Cognitive Radio and Signal Processing,
                                    Guilin University of Electronic Technology, Guilin 541004, China)
                 Abstract: Aiming at the problem of speech enhancement performance degradation because of causal-input,
                 a method based on lightweight convolution gated recurrent neural network (LCGRU) is proposed. Gated
                 recurrent neural network can model the time correlation, but its full connection structure ignores the time-
                 frequency structure of speech, and the parameters are huge, which is not conducive to training of the network.
                 In this paper, the convolution kernel is used to replace the full connection structure. While modeling the
                 time correlation of speech, the time-frequency structure is retained, and the network parameters are reduced.
                 To make full use of the characteristic of the previous frames, the input of the network at the current time
                 combines the input and output of the previous time. This paper uses the linear gating mechanism to control
                 the transmission of information, which alleviates the over fitting problem of the network and improves the
                 speech enhancement performance. The experimental results show that the network proposed has higher scores
                 than the traditional networks in PESQ, STOI and SSNR.
                 Keywords: Convolution gated recurrent neural network; Fixed delay; Causal speech enhancement; Speech
                 perceptual quality; Speech objective intelligibility


             2022-01-17 收稿; 2022-04-12 定稿
             国家自然科学基金项目 (62071135), 广西自然科学基金项目 (2019GXNSFBA245103), 认知无线电与信息处理教育部重点实验室基金
             ∗
             项目 (CRKL200111)
             作者简介: 王玫 (1963– ), 女, 山西寿阳人, 博士, 教授, 研究方向: 多媒体信息感知与处理, 位置感知与协同定位等。
             † 通信作者 E-mail: songxiyu@guet.edu.cn
   209   210   211   212   213   214   215   216   217   218   219