Page 161 - 《应用声学》2023年第4期
P. 161

第 42 卷 第 4 期          邦锦阳等: Att-U-Net:融合注意力机制的 U-Net 骨导语声增强                                 823


             题。实验结果和可视化结果分析证明该方法在骨导                                Lightweight model for boneconducted speech enhance-
             语声数据集上是有效的。该方法的潜力有待于进一                                ment based on convolution network and residual long
                                                                   shorttime memory network[J]. Journal of Data Acquisi-
             步发掘,例如减少模型的冗余和如何更充分地利用                                tion and Processing, 2021, 36(5): 921–931.
             语声信号的上下文关联信息来改进 Att-U-Net。同                        [10] Liu H P, Yu T, Chiou-Shann F. Bone-conducted speech
             时,由于骨传声特性决定骨导语声增强不同于语声                                enhancement using deep denoising autoencoder[J]. Speech
                                                                   Communication, 2018, 104: 106–112.
             去噪,该任务与说话人特征紧密相关,且骨导语声数
                                                                [11] Shifas M P, Claudio S, Stylianou Y, et al. A fully recur-
             据集的质量和数量仍显不足,对于说话人自适应的                                rent feature extraction for single channel speech enhance-
             骨导语声增强是一个非常具有挑战性的问题。                                  ment[J]. IEEE Signal Processing Letters, 2020.
                                                                [12] Ashutosh P, Deliang W. A new framework for CNN-based
                                                                   speech enhancement in the time domain[J]. IEEE/ACM
                                                                   Transactions on Audio, Speech, and Language Processing,
                            参 考     文   献                          2019, 27(7): 1179–1188.

                                                                [13] Zhao S, Nguyen T H, Ma B. Monaural speech enhance-
                                                                   ment with complex convolutional block attention module
              [1] 张雄伟, 郑昌艳, 曹铁勇, 等. 骨导麦克风语音盲增强技术研
                                                                   and joint time frequency losses[C]//ICASSP 2021- 2021
                 究现状及展望 [J]. 数据采集与处理, 2018, 33(5): 769–778.
                                                                   IEEE International Conference on Acoustics, Speech and
                 Zhang Xiongwei, Zheng Changyan, Cao Tieyong, et
                                                                   Signal Processing (ICASSP), Toronto, Canada.  IEEE,
                 al.  Blind enhancement of bone-conducted microphone
                                                                   2021.
                 speech: review and prospects[J]. Journal of Data Acquisi-
                                                                [14] Fu S W, Wang T W, Tsao Y, et al. End-to-end wave-
                 tion and Processing, 2018, 33(5): 769–778.
                                                                   form utterance enhancement for direct evaluation metrics
              [2] Huang B, Gong Y, Sun J, et al.  A wearable bone-
                                                                   optimization by fully convolutional neural networks[J].
                 conducted speech enhancement system for strong back-
                                                                   IEEE/ACM Transactions on Audio, Speech, Language
                 ground noises[C]//2017 18th International Conference on
                                                                   Processing, 2018, 26(9): 1570–1584.
                 Electronic Packaging Technology (ICEPT), 2017.
                                                                [15] Macartney C, Weyde T. Improved speech enhance-
              [3] Ikuta  A,  Orimoto  H.  Noise  suppression  method
                                                                   ment with the Wave-U-Net[J]. arXiv Preprint, arXiv:
                 by jointly using bone- and air- conducted speech
                                                                   1811.11307, 2018.
                 signals[C]//INTER-NOISE and NOISE-CON Congress
                                                                [16] Pascual S, Bonafonte A, Serrà J. Segan: speech enhance-
                 and Conference Proceedings, 2017.
                                                                   ment  generative  adversarial  network[C]//Interspeech
              [4] Huang B, Xiao Y, Sun J, et al. Speech enhancement based
                 on flann using both bone- and air-conducted measure-  2017, 2017: 3642–3646.
                                                                [17] 时文华, 张雄伟, 邹霞, 等. 联合深度编解码网络和时频掩蔽
                 ments[C]//Signal and Information Processing Association
                 Annual Summit and Conference (APSIPA), Siem Reap,  估计的单通道语音增强 [J]. 声学学报, 2020, 45(3): 299–307.
                 Cambodia IEEE, 2015.                              Shi Wenhua, Zhang Xiongwei, Zou Xia, et al. Time fre-
              [5] 郑昌艳, 杨吉斌, 张雄伟, 等. 在波形网络中融合相位信息的                  quency masking based speech enhancement using deep
                                                                   encoder-decoder neural network[J]. Acta Acustica, 2020,
                 骨导语音增强 [J]. 声学学报, 2021, 46(2): 309–320.
                 Zheng Changyan, Yang Jibin, Zhang Xiongwei, et    45(3): 299–307.
                 al. Bone-conducted speech enhancement using WaveNet  [18] Shan D, Zhang X, Zhang C, et al.  A novel
                 fused with phase information[J]. Acta Acustica, 2021,  encoder-decoder model via NS-LSTM used for bone-
                 46(2): 309–320.                                   conducted speech enhancement[J]. IEEE Access, 2018, 6:
              [6] Zhou Y, Chen Y, Ma Y, et al.  A real-time dual-  62638–62644.
                 microphone speech enhancement algorithm assisted by  [19] Tan K, Chen J, Wang D L. Gated residual networks
                 bone conduction sensor[J]. Sensors, 2020, 20(18): 5050.  with dilated convolutions for supervised speech sepa-
              [7] Yu C, Hung K H, Wang S S, et al. Time-domain multi-  ration[C]//IEEE International Conference on Acoustics,
                 modal bone/air conducted speech enhancement[J]. IEEE  2018.
                 Signal Processing Letters, 2020, 27: 1035–1039.  [20] Tan K, Wang D L. Complex spectral mapping with a
              [8] Zheng C, Yang J, Zhang X, et al. Improving the spectra  convolutional recurrent network for monaural speech en-
                 recovering of bone-conducted speech via structural simi-  hancement[C]//IEEE International Conference on Acous-
                 larity loss function[C]//2019 Asia-Pacific Signal and Infor-  tics, Speech and Signal Processing (ICASSP), 2019.
                 mation Processing Association Annual Summit and Con-  [21] Tan K, Wang D L. Learning complex spectral mapping
                 ference (APSIPA ASC), Lanzhou, China, 2020.       with gated convolutional recurrent networks for monaural
              [9] 邦锦阳, 孙蒙, 张雄伟, 等. 融合卷积网络与残差长短时记                   speech enhancement[J]. IEEE/ACM Transactions on Au-
                 忆网络的轻量级骨导语音盲增强 [J]. 数据采集与处理, 2021,                dio, Speech, Language Processing, 2019, 28(1): 380–390.
                 36(5): 921–931.                                [22] Ronneberger O, Fischer P, Brox T. U-net:  con-
                 Bang Jinyang, Sun Meng, Zhang Xiongwei, et al.    volutional networks for biomedical image segmenta-
   156   157   158   159   160   161   162   163   164   165   166