Page 168 - 《应用声学》2022年第4期
P. 168

666                                                                                  2022 年 7 月


              [3] Hu X, Wang S, Zheng C, et al. A cepstrum-based pre-  uation of the speech quality[J]. IEEE Signal Processing
                 processing and postprocessing for speech enhancement in  Letters, 2018, 25(11): 1680–1684.
                 adverse environments[J]. Applied Acoustics, 2013, 74(12):  [18] Kolbæk M, Tan Z H, Jensen S H, et al. On loss functions
                 1458–1462.                                        for supervised monaural time-domain speech enhance-
              [4] Lim J, Oppenheim A. All-pole modeling of degraded  ment[J]. IEEE/ACM Transactions on Audio, Speech, and
                 speech[J]. IEEE Transactions on Acoustics Speech and  Language Processing, 2020, 28: 825–838.
                 Signal Processing, 1978, 26(3): 197–210.       [19] Loizou P C. Speech enhancement based on perceptually
              [5] Ephraim Y, Malah D. Speech enhancement using a   motivated Bayesian estimators of the magnitude spec-
                 minimum-mean square error short-time spectral am-  trum[J]. IEEE Transactions on Speech and Audio Pro-
                 plitude estimator[J]. IEEE Transactions on Acoustics,  cessing, 2005, 13(5): 857–869.
                 Speech, and Signal Processing, 2003, 32(6): 1109–1121.  [20] Weninger F, Erdogan H, Watanabe S, et al. Speech en-
              [6] Jensen S H, Hansen P C, Hansen S D, et al. Reduction of  hancement with LSTM recurrent neural networks and its
                 broad-band noise in speech by truncated QSVD[J]. IEEE  application to noise-robust ASR[C]. International Confer-
                 Transactions on Speech and Audio Processing, 1995, 3(6):  ence on Latent Variable Analysis and Signal Separation,
                 439–448.                                          2015: 91–99.
              [7] Wang D L, Chen J. Supervised speech separation based
                                                                [21] Tan K, Wang D L. A convolutional recurrent neural net-
                 on deep learning: an overview[J]. IEEE/ACM Transac-  work for real-time speech enhancement[C]. Interspeech,
                 tions on Audio, Speech, and Language Processing, 2018,  2018: 3229–3233.
                 26(10): 1702–1726.
                                                                [22] Pascanu R, Mikolov T, Bengio Y. On the difficulty of
              [8] Shivakumar P G, Georgiou P G. Perception optimized
                                                                   training recurrent neural networks[C]. In Proceedings of
                 deep denoising autoencoders for speech enhancement[C].
                                                                   International Conference on Machine Learning (ICML),
                 Interspeech, 2016: 3743–3747.
                                                                   2013: 1310–1318.
              [9] Xu Z, Elshamy S, Fingscheidt T. Using separate losses for
                                                                [23] Weninger F, Hershey J R, Le Roux J, et al. Discrimina-
                 speech and noise in mask-based speech enhancement[C].
                                                                   tively trained recurrent neural networks for single-channel
                 IEEE International Conference on Acoustics, Speech and
                                                                   speech separation[C]. IEEE Global Conference on Signal
                 Signal Processing (ICASSP), 2020: 7519–7523.
                                                                   and Information Processing (GlobalSIP), 2014: 577–581.
             [10] Li A, Peng R, Zheng C, et al. A supervised speech en-
                                                                [24] Itakura F, Saito S. Analysis synthesis telephony based
                 hancement approach with residual noise control for voice
                                                                   on the maximum likelihood method[C]. International
                 communication[J]. Applied Sciences, 2020, 10(8): 2894.
                                                                   Congress on Acoustics, 1968: 280–292.
             [11] Xia B, Bao C. Speech enhancement with weighted denois-
                                                                [25] Gray A, Markel J. Distance measures for speech process-
                 ing auto-encoder[C]. Interspeech, 2013: 3444–3448.
                                                                   ing[J]. IEEE Transactions on Acoustics, Speech, and Sig-
             [12] Kumar A, Florencio D. Speech enhancement in multiple-
                                                                   nal Processing, 1976, 24(5): 380–391.
                 noise conditions using deep neural networks[C]. Inter-
                                                                [26] Shikano K, Sugiyama M. Evaluation of LPC spectral
                 speech, 2016: 3738–3742.
                                                                   matching measures for spoken word recognition[J]. Trans-
             [13] Liu Q, Wang W, Jackson P J B, et al. A perceptually-
                                                                   actions on IECE, 1982, 565(5): 535–541.
                 weighted deep neural network for monaural speech en-
                 hancement in various background noise conditions[C]. Eu-  [27] Zue V, Seneff S, Glass J. Speech database development
                 ropean Signal Processing Conference (EUSIPCO), 2017:  at MIT: TIMIT and beyond[J]. Speech Communication,
                                                                   1990, 9(4): 351–356.
                 1270–1274.
             [14] Rix A W, Beerends J G, Hollier M P, et al.  Percep-  [28] Hu G, Wang D L. A tandem algorithm for pitch estima-
                 tual evaluation of speech quality (PESQ)—A new method  tion and voiced speech segregation[J]. IEEE Transactions
                 for speech quality assessment of telephone networks and  on Audio, Speech, and Language Processing, 2010, 18(8):
                 codecs[C]. IEEE International Conference on Acoustics,  2067–2079.
                 Speech, and Signal Processing (ICASSP), 2001: 749–752.  [29] Xu Y, Du J, Huang Z, et al. Multi-objective learning and
             [15] Taal C H, Hendriks R C, Heusdens R, et al. An algorithm  mask-based post-processing for deep neural network based
                 for intelligibility prediction of time–frequency weighted  speech enhancement[C]. Interspeech, 2015: 1508–1512.
                 noisy speech[J]. IEEE Transactions on Audio, Speech, and  [30] Varga A, Steeneken H J M. Assessment for automatic
                 Language Processing, 2011, 19(7): 2125–2136.      speech recognition:  II. NOISEX-92:  a database and
             [16] Kolbæk M, Tan Z H, Jensen J. On the relationship be-  an experiment to study the effect of additive noise on
                 tween short-time objective intelligibility and short-time  speech recognition systems[J]. Speech Communication,
                 spectral-amplitude mean-square error for speech enhance-  1993, 12(3): 247–251.
                 ment[J]. IEEE/ACM Transactions on Audio, Speech, and  [31] Hu Y, Loizou P C. Evaluation of objective quality mea-
                 Language Processing, 2018, 27(2): 283–295.        sures for speech enhancement[J]. IEEE Transactions on
             [17] Martin-Donas J M, Gomez A M, Gonzalez J A, et al. A  Audio, Speech, and Language Processing, 2008, 16(1):
                 deep learning loss function based on the perceptual eval-  229–238.
   163   164   165   166   167   168   169   170   171   172   173