Page 152 - 《应用声学》2023年第4期

P. 152

第 42 卷第 4 期 Vol. 42, No. 4
2023 年 7 月 Journal of Applied Acoustics July, 2023

⋄ 研究报告 ⋄

Att-U-Net：融合注意力机制的U-Net骨导语声增强 ∗

邦锦阳 1,2 张玥 1† 张雄伟 1 孙蒙 1 刘伟 1 栾合禹 2

(1 陆军工程大学南京 210007)
(2 中国人民解放军 66389 部队郑州 450009)
摘要：近年来，大量全卷积网络、U-Net 等编解码网络结构应用于语声增强，然而，此类结构不能充分利用先后
时间与高低频率之间的关联信息，对于处理长序列数据存在信息丢失的问题。为保持计算效率的同时实现更
充分的时频关联信息建模，该文提出一种融合注意力机制的 U-Net 网络的骨导语声增强方法 (Att-U-Net)，通
过在跳跃连接中引入注意力机制，生成一个权重矩阵，将编码层中的全局信息根据权重融入对应的解码层中，
使网络在编解码过程中能够关注输入数据中与增强目标相关程度高的重要信息，同时抑制不相关的信息。在
骨导语声数据集上的实验表明，融合注意力机制的 U-Net 网络能在保持模型轻量化的同时有效提升骨导语声
的增强效果，增强后的语声在各项客观评价指标上均优于基线模型。
关键词：骨导语声增强；深度学习；注意力机制；U-Net
中图法分类号: TN912.35 文献标识码: A 文章编号: 1000-310X(2023)04-0814-11
DOI: 10.11684/j.issn.1000-310X.2023.04.017

Att-U-Net: bone conducted speech enhancementbased on U-Net with
attention mechanism

BANG Jinyang 1,2 ZHANG Yue 1 ZHANG Xiongwei 1 SUN Meng 1 LIU Wei 1 LUAN Heyu 2

(1 Army Engineering University of PLA, Nanjing 210007, China)
(2 Unit 66389 of PLA, Zhengzhou 450009, China)

Abstract: In recent years, a large number of decoded networks are applied to speech enhancement, such as
full convolutional networks, U-Net, etc. However, Encoder-Decoder structures can not make the best of the
correlation information on time series and relationship between high and low frequencies and have the problem
of information loss for the long sequence input data. In order to maintain the computational eﬃciency and
realize more suﬃcient time-frequency correlation information modeling, this paper proposes a bone conducted
speech enhancement method (Att-U-Net), which combines U-Net network and attention mechanism. Through
introducing attention mechanism into skip connection and generating a weight matrix, the global information in
the encoding layer is transmitted to the corresponding decoding layer according to the weight coeﬃcient in the
process of encoding and decoding. The network can pay attention to the important information highly related
to the enhancement target in the input data, while suppressing the irrelevant information. Experiments on
bone conduction speech dataset show that the U-Net integrating attention mechanism can eﬀectively improve
the bone conduction speech enhancement eﬀect while maintaining the lightweight of the model. The enhanced
speech is better than the baseline model in objective evaluation indicators.
Keywords: Bone conducted speech enhancement; Deep learning; Attention mechanism; U-Net
2022-04-28 收稿; 2022-07-21 定稿
国家自然科学基金项目 (62071484)
∗
作者简介: 邦锦阳 (1996– ), 男, 江西丰城人, 硕士, 研究方向: 语声处理与网络安全。
† 通信作者 E-mail: zy1084476070@163.com

147 148 149 150 151 152 153 154 155 156 157