Page 152 - 《应用声学》2023年第4期
P. 152

第 42 卷 第 4 期                                                                       Vol. 42, No. 4
             2023 年 7 月                          Journal of Applied Acoustics                      July, 2023

             ⋄ 研究报告 ⋄



             Att-U-Net:融合注意力机制的U-Net骨导语声增强                                                                ∗



                        邦锦阳     1,2   张 玥     1†   张雄伟      1    孙 蒙    1    刘 伟    1    栾合禹      2



                                                 (1 陆军工程大学     南京   210007)
                                            (2 中国人民解放军 66389 部队     郑州  450009)
                摘要:近年来,大量全卷积网络、U-Net 等编解码网络结构应用于语声增强,然而,此类结构不能充分利用先后
                时间与高低频率之间的关联信息,对于处理长序列数据存在信息丢失的问题。为保持计算效率的同时实现更
                充分的时频关联信息建模,该文提出一种融合注意力机制的 U-Net 网络的骨导语声增强方法 (Att-U-Net),通
                过在跳跃连接中引入注意力机制,生成一个权重矩阵,将编码层中的全局信息根据权重融入对应的解码层中,
                使网络在编解码过程中能够关注输入数据中与增强目标相关程度高的重要信息,同时抑制不相关的信息。在
                骨导语声数据集上的实验表明,融合注意力机制的 U-Net 网络能在保持模型轻量化的同时有效提升骨导语声
                的增强效果,增强后的语声在各项客观评价指标上均优于基线模型。
                关键词:骨导语声增强;深度学习;注意力机制;U-Net
                中图法分类号: TN912.35           文献标识码: A          文章编号: 1000-310X(2023)04-0814-11
                DOI: 10.11684/j.issn.1000-310X.2023.04.017


                    Att-U-Net: bone conducted speech enhancementbased on U-Net with
                                                attention mechanism



               BANG Jinyang  1,2  ZHANG Yue  1   ZHANG Xiongwei  1   SUN Meng  1   LIU Wei 1  LUAN Heyu   2

                                     (1 Army Engineering University of PLA, Nanjing 210007, China)
                                          (2 Unit 66389 of PLA, Zhengzhou 450009, China)

                 Abstract: In recent years, a large number of decoded networks are applied to speech enhancement, such as
                 full convolutional networks, U-Net, etc. However, Encoder-Decoder structures can not make the best of the
                 correlation information on time series and relationship between high and low frequencies and have the problem
                 of information loss for the long sequence input data. In order to maintain the computational efficiency and
                 realize more sufficient time-frequency correlation information modeling, this paper proposes a bone conducted
                 speech enhancement method (Att-U-Net), which combines U-Net network and attention mechanism. Through
                 introducing attention mechanism into skip connection and generating a weight matrix, the global information in
                 the encoding layer is transmitted to the corresponding decoding layer according to the weight coefficient in the
                 process of encoding and decoding. The network can pay attention to the important information highly related
                 to the enhancement target in the input data, while suppressing the irrelevant information. Experiments on
                 bone conduction speech dataset show that the U-Net integrating attention mechanism can effectively improve
                 the bone conduction speech enhancement effect while maintaining the lightweight of the model. The enhanced
                 speech is better than the baseline model in objective evaluation indicators.
                 Keywords: Bone conducted speech enhancement; Deep learning; Attention mechanism; U-Net
             2022-04-28 收稿; 2022-07-21 定稿
             国家自然科学基金项目 (62071484)
             ∗
             作者简介: 邦锦阳 (1996– ), 男, 江西丰城人, 硕士, 研究方向: 语声处理与网络安全。
             † 通信作者 E-mail: zy1084476070@163.com
   147   148   149   150   151   152   153   154   155   156   157