Page 220 - 《应用声学》2023年第3期
P. 220

658                                                                                  2023 年 5 月


              [3] Boll S. Suppression of acoustic noise in speech using  Acoustics, Speech and Signal Processing (ICASSP). IEEE,
                 spectral subtraction[J]. IEEE Transactions on Acoustics,  2018: 2901–2905.
                 Speech, and Signal Processing, 1979, 27(2): 113–120.  [18] Pandey A, Wang D L. TCNN: temporal convolutional
              [4] Weiss M R, Aschkenasy E, Parsons T W. Study and devel-  neural network for real-time speech enhancement in the
                 opment of the INTEL technique for improving speech in-  time domain[C]//ICASSP 2019-2019 IEEE International
                 telligibility[R]. Nicolet Scientific Corp Northvale Nj, 1975.  Conference on Acoustics, Speech and Signal Processing
              [5] Chen J, Benesty J, Huang Y, et al. New insights into  (ICASSP). IEEE, 2019: 6875–6879.
                 the noise reduction Wiener filter[J]. IEEE Transactions  [19] Qian K, Zhang Y, Chang S, et al. Speech enhancement us-
                 on Audio, Speech, and Language Processing, 2006, 14(4):  ing Bayesian wavenet[C]//Interspeech, 2017: 2013–2017.
                 1218–1234.                                     [20] Kim S, Lee S, Song J, et al. FloWaveNet: a generative
              [6] Lim J S, Oppenheim A V. Enhancement and bandwidth  flow for raw audio[J]. arXiv Preprint, arXiv: 1811.02155,
                 compression of noisy speech[J]. Proceedings of the IEEE,  2018.
                 1979, 67(12): 1586–1604.                       [21] Rethage D, Pons J, Serra X. A wavenet for speech denois-
              [7] Hu Y, Loizou P. Incorporating a psycho-acoustical model
                                                                   ing[C]//2018 IEEE International Conference on Acous-
                 in frequency domain speech enhancement[J]. IEEE Signal
                                                                   tics, Speech and Signal Processing (ICASSP). IEEE, 2018:
                 Process Letters, 2004, 11(2): 270–273.
                                                                   5069–5073.
              [8] Dendrinos M, Bakamidis S, Carayannis G. Speech en-
                                                                [22] Yuan W. Incorporating group update for speech enhance-
                 hancement from noise: a regenerative approach[J]. Speech
                                                                   ment based on convolutional gated recurrent network[J].
                 Communication, 1991, 10(1): 45–57.
                                                                   Speech Communication, 2021, 132: 32–39.
              [9] Virtanen T. Monaural sound source separation by non-
                                                                [23] Hu Y, Liu Y, Lyu S, et al. DCCRN: deep complex convo-
                 negative matrix factorization with temporal continuity
                                                                   lution recurrent network for phase-aware speech enhance-
                 and sparseness criteria[J]. IEEE Transactions on Audio,
                                                                   ment[J]. arXiv Preprint, arXiv: 2008.00264, 2020.
                 Speech, and Language Processing: A Publication of the
                                                                [24] Tu Y H, Du J, Lee C H. Speech enhancement based on
                 IEEE Signal Processing Society, 2007, 15(3): 1066–1074.
                                                                   teacher–student deep learning using improved speech pres-
             [10] Schmidt M N, Larsen J. Reduction of non-stationary
                                                                   ence probability for noise-robust speech recognition[J].
                 noise using a non-negative latent variable decomposi-
                                                                   IEEE/ACM Transactions on Audio, Speech, and Lan-
                 tion[C]//2008 IEEE Workshop on Machine Learning for
                                                                   guage Processing, 2019, 27(12): 2080–2091.
                 Signal Processing. IEEE, 2008: 486–491.
                                                                [25] 时云龙, 袁文浩, 胡少东, 等. 一种用于实时语音增强的卷积准
             [11] Mohammadiha N, Smaragdis P, Leijon A. Supervised and
                                                                   循环网络 [J]. 西安电子科技大学学报, 2022, 49(3): 183–190.
                 unsupervised speech enhancement using nonnegative ma-
                                                                   Shi Yunlong, Yuan Wenhao, Hu Shaodong, et al. Convo-
                 trix factorization[J]. IEEE Transactions on Audio, Speech,
                                                                   lutional quasi-recurrent network for real-time speech en-
                 and Language Processing, 2013, 21(10): 2140–2151.
                                                                   hancement[J]. Journal of Xidian University, 2022, 49(3):
             [12] Ou S, Song P, Gao Y. Laplacian speech model and soft
                                                                   183–190.
                 decision based MMSE estimator for noise power spec-
                                                                [26] 李江和, 王玫. 一种用于因果式语音增强的门控循环神经网
                 tral density in speech enhancement[J]. Chinese Journal
                                                                   络 [J]. 计算机工程, 2022, 48(11): 77–82.
                 of Electronics, 2018, 27(6): 1214–1220.
                                                                   Li Jianghe, Wang Mei. A gated recurrent neural network
             [13] Wang Z, Sha F. Discriminative non-negative matrix fac-
                 torization for single-channel speech separation[C]//2014  for causal speech enhancement[J]. Computer Engineering,
                                                                   2022, 48(11): 77–82.
                 IEEE International Conference on Acoustics, Speech and
                 Signal Processing (ICASSP). IEEE, 2014: 3749–3753.  [27] Dauphin Y N, Fan A, Auli M, et al. Language model-
             [14] Kwon K, Shin J W, Sonowat S, et al. Speech enhance-  ing with gated convolutional networks[C]//International
                 ment combining statistical models and NMF with update  Conference on Machine Learning. PMLR, 2017: 933–941.
                 of speech and noise bases[C]//2014 IEEE International  [28] Garofolo J S, Lamel L F, Fisher W M, et al. TIMIT
                 Conference on Acoustics, Speech and Signal Processing  acoustic-phonetic  continuous  speech  corpus[EB/OL].
                 (ICASSP). IEEE, 2014: 7053–7057.                  [2018-09-10]. https://catalog.ldc.upenn. edu/LDC93S1.
             [15] Kwon K, Shin J W, Kim N S. NMF-based speech en-  [29] Hu G. 100 Nonspeech environmental sounds[EB/OL].
                 hancement using bases update[J]. IEEE Signal Processing  [2018-09-03].  http://web.cse.ohio-state.edu/pnl/corpus/
                 Letters, 2014, 22(4): 450–454.                    HuNonsp eech/HuCorpus.html.
             [16] Wang D L, Chen J. Supervised speech separation based  [30] Arga A, Steeneken H J M. Assessment for automatic
                 on deep learning: an overview[J]. IEEE/ACM Transac-  speech recognition: II. NOISEX-92: a database and an
                 tions on Audio, Speech, and Language Processing, 2018,  experiment to study the effect of additive noise on speech
                 26(10): 1702–1726.                                recognition systems[J]. Speech Communication, 1993,
             [17] Yang Y, Bao C. DNN-based AR-wiener filtering for speech  12(3): 247–251.
                 enhancement[C]//2018 IEEE International Conference on
   215   216   217   218   219   220   221   222   223   224   225