Att-U-Net: Bone Conducted Speech Enhancement based on U-Net with Attention Mechanism
投稿时间:2022-04-28  修订日期:2022-11-18
      In recent years, a large number of decoded networks are applied to speech enhancement, such as full convolutional networks, U-Net, etc., with low computational complexity and low model parameters. However, compared with the Long Short-Term Memory(LSTM) model, Encoder-Decoder structures still can not make the best of the correlation information on time series and relationship between high and low frequencies. Especially for the long sequence input data, Encoder-Decoder structure has the problem of information loss. In order to maintain the computational efficiency and consider more sufficient time-frequency correlation information modeling, this paper proposes a bone conducted speech enhancement method (Att-U-Net), which combines U-Net network and attention mechanism. Through introducing attention mechanism into skip connection and generating a weight matrix, the global information in the encoding layer is transmitted to the corresponding decoding layer according to the weight coefficient in the process of encoding and decoding. The network can pay attention to the important information highly related to the enhancement target in the input data, while suppressing the irrelevant information. Experiments on bone conduction speech dataset show that the U-Net integrating attention mechanism can effectively improve the bone conduction speech enhancement effect while maintaining the lightweight of the model. The enhanced speech is better than the baseline model in objective evaluation indicators. Through the visual analysis of the middle layer of the Encoder-Decoder structure, it is found that the attention mechanism effectively retains the information of the sound segment in the decoding process and filters out the intermediate frequency resonance due to bone sound transmission characteristics. The enhanced bone conducted speech has a better sense of hearing.
中文关键词: 骨导语音增强  深度学习  注意力机制  U-Net
英文关键词: Bone conducted speech enhancement  Deep learning  Attention mechanism  U-Net
邦锦阳 中国人民解放军部队 bangjinyang@163.com 
张玥* 陆军工程大学指挥控制工程学院 zy1084476070@163.com 
张雄伟 陆军工程大学指挥控制工程学院 xwzhang9898@163.com 
孙蒙 陆军工程大学指挥控制工程学院 sunmeng@aeu.edu.cn 
刘伟 陆军工程大学指挥控制工程学院 weiliu_1997it@163.com 
栾合禹 中国人民解放军部队 luahy96@163.com 
摘要点击次数: 571
全文下载次数: 860
查看全文   查看/发表评论  下载PDF阅读器