Page 87 - 《应用声学》2023年第2期
P. 87
第 42 卷 第 2 期 卞金洪等: 深度复卷积递归网络模型的师生学习语声增强方法 275
差距仍比较大,这是由于师生模型原本的差距相比 [9] Hu Y, Liu Y, Lyu, S, et al. DCCRN: deep complex convo-
小数据集扩大了,使得师生的引导相对困难。而与 lution recurrent network for phase-aware speech enhance-
ment[C]// Proc. Interspeech 2020: 2472–2476.
同样低复杂度的实时算法 NSNet 和 RNNoise 相比,
[10] Veaux C, Yamagishi J, King S. The voice bank corpus:
本文所提出的模型在维持低参数量的同时取得了 design, collection and data analysis of a large regional ac-
更好的指标结果。 cent speech database[C]// 2013 International Conference
Oriental COCOSDA Held Jointly with 2013 Conference
4 结论 on Asian Spoken Language Research and Evaluation (O-
COCOSDA/CASLRE), 2013: 1–4.
[11] Reddy C K A, Gopal V, Cutler R, et al. The INTER-
在本文中,针对现有基于深度学习的语声增
SPEECH 2020 deep noise suppression challenge: datasets,
强模型参数规模大、计算复杂度高的问题,基于 subjective testing framework, and challenge results[C]//
DCCRN结构构建了师生学习框架。对复 LSTM 模 Proc. Interspeech 2020: 2492–2496.
[12] ITU R I T U T P. 862.2: wideband extension to
块的实部和虚部特征流分别计算帧级特征损失以
recommendation P. 862 for the assessment of wide-
拉近教师和学生模型的距离。同时,以MRSTFT损 band telephone networks and speech codecs[Z]. ITU-
失作为学生模型的基础损失以提升学生模型的增 Telecommunication Standardization Sector, 2007.
强效果。实验结果表明,相对于基线的学生模型,所 [13] Taal C H, Hendriks R C, Heusdens R, et al. An al-
gorithm for intelligibility prediction of time–frequency
提方法在各项指标上均有优势。通过师生学习引导 weighted noisy speech[J]. IEEE/ACM Transactions on
训练的学生模型能在低参数量下取得与大规模模 Audio, Speech, and Language Processing, 2011, 19(7):
型相近的性能,在公开数据集上取得了具有竞争力 2125–2136.
[14] Hu Y, Loizou P C. Evaluation of objective quality mea-
的结果。未来将继续研究师生学习在各种网络结构
sures for speech enhancement[J]. IEEE/ACM Transac-
上的应用。 tions on Audio, Speech, and Language Processing, 2008,
16(1): 229–238.
[15] Scalart P, Filho J V. Speech enhancement based on a
参 考 文 献 priori signal to noise estimation[C]// 1996 IEEE Interna-
tional Conference on Acoustics, Speech, and Signal Pro-
[1] Benesty J, Makino S, Chen J. Speech enhancement[M]. cessing Conference Proceedings, 1996, 2: 629–632.
Germany: Springer Science & Business Media, 2006. [16] Pascual S, Bonafonte A, Serrà J. SEGAN: speech enhance-
[2] Loizou P C. Speech enhancement: theory and practice[M]. ment generative adversarial network[C]// Proc. Inter-
America: Chemical Rubber Company Press, 2017. speech 2017: 3642–3646.
[3] Xu Y, Du J, Dai L R, et al. A regression approach to [17] Germain F G, Chen Q, Koltun V. Speech denoising with
speech enhancement based on deep neural networks[J]. deep feature losses[J]. arXiv Preprint, arXiv: 1806.10522.
IEEE/ACM Transactions on Audio, Speech, and Lan- [18] Macartney C, Weyde T. Improved speech enhance-
guage Processing, 2015, 23(1): 7–19. ment with the wave-u-net[J]. arXiv Preprint, arXiv:
[4] Han S, Mao H, Dally W J. Deep compression: com- 1811.11307.
pressing deep neural network with pruning, trained quan- [19] Geng C, Wang L. End-to-end speech enhancement based
tization and huffman coding[J]. arXiv Preprint, arXiv: on discrete cosine transform[C]// 2020 IEEE International
1510.00149. Conference on Artificial Intelligence and Computer Appli-
[5] LeCun Y, Denker J S, Solla S A. Optimal brain dam- cations (ICAICA), 2020: 379–383.
age[C]// Advances in Neural Information Processing Sys- [20] Fu S W, Liao C F, Tsao Y, et al. Metricgan: genera-
tems, 1990: 598–605. tive adversarial networks based black-box metric scores
[6] Cheng Y, Wang D, Zhou P, et al. A survey of model optimization for speech enhancement[C]// International
compression and acceleration for deep neural networks[J]. Conference on Machine Learning, 2019: 2031–2041.
arXiv Preprint, arXiv: 1710.09282. [21] Xia Y, Braun S, Reddy C K A, et al. Weighted speech dis-
[7] Hinton G, Vinyals O, Dean J. Distilling the knowledge in tortion losses for neural-network-based real-time speech
a neural network[J]. arXiv Preprint, arXiv: 1503.02531. enhancement[C]// ICASSP 2020 - 2020 IEEE Interna-
[8] Yamamoto R, Song E, Kim J M. Parallel WaveGAN: a tional Conference on Acoustics, Speech and Signal Pro-
fast waveform generation model based on generative ad- cessing (ICASSP), 2020: 871–875.
versarial networks with multi-resolution spectrogram[C]// [22] Valin J. A hybrid DSP/deep learning approach to real-
ICASSP 2020-2020 IEEE International Conference on time full-band speech enhancement[C]// 2018 IEEE 20th
Acoustics, Speech and Signal Processing (ICASSP), 2020: International Workshop on Multimedia Signal Processing
6199–6203. (MMSP), 2018: 1–5.