Page 161 - 《应用声学》2023年第4期
P. 161
第 42 卷 第 4 期 邦锦阳等: Att-U-Net:融合注意力机制的 U-Net 骨导语声增强 823
题。实验结果和可视化结果分析证明该方法在骨导 Lightweight model for boneconducted speech enhance-
语声数据集上是有效的。该方法的潜力有待于进一 ment based on convolution network and residual long
shorttime memory network[J]. Journal of Data Acquisi-
步发掘,例如减少模型的冗余和如何更充分地利用 tion and Processing, 2021, 36(5): 921–931.
语声信号的上下文关联信息来改进 Att-U-Net。同 [10] Liu H P, Yu T, Chiou-Shann F. Bone-conducted speech
时,由于骨传声特性决定骨导语声增强不同于语声 enhancement using deep denoising autoencoder[J]. Speech
Communication, 2018, 104: 106–112.
去噪,该任务与说话人特征紧密相关,且骨导语声数
[11] Shifas M P, Claudio S, Stylianou Y, et al. A fully recur-
据集的质量和数量仍显不足,对于说话人自适应的 rent feature extraction for single channel speech enhance-
骨导语声增强是一个非常具有挑战性的问题。 ment[J]. IEEE Signal Processing Letters, 2020.
[12] Ashutosh P, Deliang W. A new framework for CNN-based
speech enhancement in the time domain[J]. IEEE/ACM
Transactions on Audio, Speech, and Language Processing,
参 考 文 献 2019, 27(7): 1179–1188.
[13] Zhao S, Nguyen T H, Ma B. Monaural speech enhance-
ment with complex convolutional block attention module
[1] 张雄伟, 郑昌艳, 曹铁勇, 等. 骨导麦克风语音盲增强技术研
and joint time frequency losses[C]//ICASSP 2021- 2021
究现状及展望 [J]. 数据采集与处理, 2018, 33(5): 769–778.
IEEE International Conference on Acoustics, Speech and
Zhang Xiongwei, Zheng Changyan, Cao Tieyong, et
Signal Processing (ICASSP), Toronto, Canada. IEEE,
al. Blind enhancement of bone-conducted microphone
2021.
speech: review and prospects[J]. Journal of Data Acquisi-
[14] Fu S W, Wang T W, Tsao Y, et al. End-to-end wave-
tion and Processing, 2018, 33(5): 769–778.
form utterance enhancement for direct evaluation metrics
[2] Huang B, Gong Y, Sun J, et al. A wearable bone-
optimization by fully convolutional neural networks[J].
conducted speech enhancement system for strong back-
IEEE/ACM Transactions on Audio, Speech, Language
ground noises[C]//2017 18th International Conference on
Processing, 2018, 26(9): 1570–1584.
Electronic Packaging Technology (ICEPT), 2017.
[15] Macartney C, Weyde T. Improved speech enhance-
[3] Ikuta A, Orimoto H. Noise suppression method
ment with the Wave-U-Net[J]. arXiv Preprint, arXiv:
by jointly using bone- and air- conducted speech
1811.11307, 2018.
signals[C]//INTER-NOISE and NOISE-CON Congress
[16] Pascual S, Bonafonte A, Serrà J. Segan: speech enhance-
and Conference Proceedings, 2017.
ment generative adversarial network[C]//Interspeech
[4] Huang B, Xiao Y, Sun J, et al. Speech enhancement based
on flann using both bone- and air-conducted measure- 2017, 2017: 3642–3646.
[17] 时文华, 张雄伟, 邹霞, 等. 联合深度编解码网络和时频掩蔽
ments[C]//Signal and Information Processing Association
Annual Summit and Conference (APSIPA), Siem Reap, 估计的单通道语音增强 [J]. 声学学报, 2020, 45(3): 299–307.
Cambodia IEEE, 2015. Shi Wenhua, Zhang Xiongwei, Zou Xia, et al. Time fre-
[5] 郑昌艳, 杨吉斌, 张雄伟, 等. 在波形网络中融合相位信息的 quency masking based speech enhancement using deep
encoder-decoder neural network[J]. Acta Acustica, 2020,
骨导语音增强 [J]. 声学学报, 2021, 46(2): 309–320.
Zheng Changyan, Yang Jibin, Zhang Xiongwei, et 45(3): 299–307.
al. Bone-conducted speech enhancement using WaveNet [18] Shan D, Zhang X, Zhang C, et al. A novel
fused with phase information[J]. Acta Acustica, 2021, encoder-decoder model via NS-LSTM used for bone-
46(2): 309–320. conducted speech enhancement[J]. IEEE Access, 2018, 6:
[6] Zhou Y, Chen Y, Ma Y, et al. A real-time dual- 62638–62644.
microphone speech enhancement algorithm assisted by [19] Tan K, Chen J, Wang D L. Gated residual networks
bone conduction sensor[J]. Sensors, 2020, 20(18): 5050. with dilated convolutions for supervised speech sepa-
[7] Yu C, Hung K H, Wang S S, et al. Time-domain multi- ration[C]//IEEE International Conference on Acoustics,
modal bone/air conducted speech enhancement[J]. IEEE 2018.
Signal Processing Letters, 2020, 27: 1035–1039. [20] Tan K, Wang D L. Complex spectral mapping with a
[8] Zheng C, Yang J, Zhang X, et al. Improving the spectra convolutional recurrent network for monaural speech en-
recovering of bone-conducted speech via structural simi- hancement[C]//IEEE International Conference on Acous-
larity loss function[C]//2019 Asia-Pacific Signal and Infor- tics, Speech and Signal Processing (ICASSP), 2019.
mation Processing Association Annual Summit and Con- [21] Tan K, Wang D L. Learning complex spectral mapping
ference (APSIPA ASC), Lanzhou, China, 2020. with gated convolutional recurrent networks for monaural
[9] 邦锦阳, 孙蒙, 张雄伟, 等. 融合卷积网络与残差长短时记 speech enhancement[J]. IEEE/ACM Transactions on Au-
忆网络的轻量级骨导语音盲增强 [J]. 数据采集与处理, 2021, dio, Speech, Language Processing, 2019, 28(1): 380–390.
36(5): 921–931. [22] Ronneberger O, Fischer P, Brox T. U-net: con-
Bang Jinyang, Sun Meng, Zhang Xiongwei, et al. volutional networks for biomedical image segmenta-