Page 87 - 《应用声学》2023年第2期
P. 87

第 42 卷 第 2 期          卞金洪等: 深度复卷积递归网络模型的师生学习语声增强方法                                          275


             差距仍比较大,这是由于师生模型原本的差距相比                              [9] Hu Y, Liu Y, Lyu, S, et al. DCCRN: deep complex convo-
             小数据集扩大了,使得师生的引导相对困难。而与                                lution recurrent network for phase-aware speech enhance-
                                                                   ment[C]// Proc. Interspeech 2020: 2472–2476.
             同样低复杂度的实时算法 NSNet 和 RNNoise 相比,
                                                                [10] Veaux C, Yamagishi J, King S. The voice bank corpus:
             本文所提出的模型在维持低参数量的同时取得了                                 design, collection and data analysis of a large regional ac-
             更好的指标结果。                                              cent speech database[C]// 2013 International Conference
                                                                   Oriental COCOSDA Held Jointly with 2013 Conference
             4 结论                                                  on Asian Spoken Language Research and Evaluation (O-
                                                                   COCOSDA/CASLRE), 2013: 1–4.
                                                                [11] Reddy C K A, Gopal V, Cutler R, et al. The INTER-
                 在本文中,针对现有基于深度学习的语声增
                                                                   SPEECH 2020 deep noise suppression challenge: datasets,
             强模型参数规模大、计算复杂度高的问题,基于                                 subjective testing framework, and challenge results[C]//
             DCCRN结构构建了师生学习框架。对复 LSTM 模                            Proc. Interspeech 2020: 2492–2496.
                                                                [12] ITU R I T U T P. 862.2:  wideband extension to
             块的实部和虚部特征流分别计算帧级特征损失以
                                                                   recommendation P. 862 for the assessment of wide-
             拉近教师和学生模型的距离。同时,以MRSTFT损                              band telephone networks and speech codecs[Z]. ITU-
             失作为学生模型的基础损失以提升学生模型的增                                 Telecommunication Standardization Sector, 2007.
             强效果。实验结果表明,相对于基线的学生模型,所                            [13] Taal C H, Hendriks R C, Heusdens R, et al.  An al-
                                                                   gorithm for intelligibility prediction of time–frequency
             提方法在各项指标上均有优势。通过师生学习引导                                weighted noisy speech[J]. IEEE/ACM Transactions on
             训练的学生模型能在低参数量下取得与大规模模                                 Audio, Speech, and Language Processing, 2011, 19(7):
             型相近的性能,在公开数据集上取得了具有竞争力                                2125–2136.
                                                                [14] Hu Y, Loizou P C. Evaluation of objective quality mea-
             的结果。未来将继续研究师生学习在各种网络结构
                                                                   sures for speech enhancement[J]. IEEE/ACM Transac-
             上的应用。                                                 tions on Audio, Speech, and Language Processing, 2008,
                                                                   16(1): 229–238.
                                                                [15] Scalart P, Filho J V. Speech enhancement based on a
                            参 考     文   献                          priori signal to noise estimation[C]// 1996 IEEE Interna-
                                                                   tional Conference on Acoustics, Speech, and Signal Pro-
              [1] Benesty J, Makino S, Chen J. Speech enhancement[M].  cessing Conference Proceedings, 1996, 2: 629–632.
                 Germany: Springer Science & Business Media, 2006.  [16] Pascual S, Bonafonte A, Serrà J. SEGAN: speech enhance-
              [2] Loizou P C. Speech enhancement: theory and practice[M].  ment generative adversarial network[C]// Proc.  Inter-
                 America: Chemical Rubber Company Press, 2017.     speech 2017: 3642–3646.
              [3] Xu Y, Du J, Dai L R, et al. A regression approach to  [17] Germain F G, Chen Q, Koltun V. Speech denoising with
                 speech enhancement based on deep neural networks[J].  deep feature losses[J]. arXiv Preprint, arXiv: 1806.10522.
                 IEEE/ACM Transactions on Audio, Speech, and Lan-  [18] Macartney C, Weyde T. Improved speech enhance-
                 guage Processing, 2015, 23(1): 7–19.              ment with the wave-u-net[J]. arXiv Preprint, arXiv:
              [4] Han S, Mao H, Dally W J. Deep compression: com-  1811.11307.
                 pressing deep neural network with pruning, trained quan-  [19] Geng C, Wang L. End-to-end speech enhancement based
                 tization and huffman coding[J]. arXiv Preprint, arXiv:  on discrete cosine transform[C]// 2020 IEEE International
                 1510.00149.                                       Conference on Artificial Intelligence and Computer Appli-
              [5] LeCun Y, Denker J S, Solla S A. Optimal brain dam-  cations (ICAICA), 2020: 379–383.
                 age[C]// Advances in Neural Information Processing Sys-  [20] Fu S W, Liao C F, Tsao Y, et al. Metricgan: genera-
                 tems, 1990: 598–605.                              tive adversarial networks based black-box metric scores
              [6] Cheng Y, Wang D, Zhou P, et al. A survey of model  optimization for speech enhancement[C]// International
                 compression and acceleration for deep neural networks[J].  Conference on Machine Learning, 2019: 2031–2041.
                 arXiv Preprint, arXiv: 1710.09282.             [21] Xia Y, Braun S, Reddy C K A, et al. Weighted speech dis-
              [7] Hinton G, Vinyals O, Dean J. Distilling the knowledge in  tortion losses for neural-network-based real-time speech
                 a neural network[J]. arXiv Preprint, arXiv: 1503.02531.  enhancement[C]// ICASSP 2020 - 2020 IEEE Interna-
              [8] Yamamoto R, Song E, Kim J M. Parallel WaveGAN: a  tional Conference on Acoustics, Speech and Signal Pro-
                 fast waveform generation model based on generative ad-  cessing (ICASSP), 2020: 871–875.
                 versarial networks with multi-resolution spectrogram[C]//  [22] Valin J. A hybrid DSP/deep learning approach to real-
                 ICASSP 2020-2020 IEEE International Conference on  time full-band speech enhancement[C]// 2018 IEEE 20th
                 Acoustics, Speech and Signal Processing (ICASSP), 2020:  International Workshop on Multimedia Signal Processing
                 6199–6203.                                        (MMSP), 2018: 1–5.
   82   83   84   85   86   87   88   89   90   91   92