Page 168 - 《应用声学》2022年第4期
P. 168
666 2022 年 7 月
[3] Hu X, Wang S, Zheng C, et al. A cepstrum-based pre- uation of the speech quality[J]. IEEE Signal Processing
processing and postprocessing for speech enhancement in Letters, 2018, 25(11): 1680–1684.
adverse environments[J]. Applied Acoustics, 2013, 74(12): [18] Kolbæk M, Tan Z H, Jensen S H, et al. On loss functions
1458–1462. for supervised monaural time-domain speech enhance-
[4] Lim J, Oppenheim A. All-pole modeling of degraded ment[J]. IEEE/ACM Transactions on Audio, Speech, and
speech[J]. IEEE Transactions on Acoustics Speech and Language Processing, 2020, 28: 825–838.
Signal Processing, 1978, 26(3): 197–210. [19] Loizou P C. Speech enhancement based on perceptually
[5] Ephraim Y, Malah D. Speech enhancement using a motivated Bayesian estimators of the magnitude spec-
minimum-mean square error short-time spectral am- trum[J]. IEEE Transactions on Speech and Audio Pro-
plitude estimator[J]. IEEE Transactions on Acoustics, cessing, 2005, 13(5): 857–869.
Speech, and Signal Processing, 2003, 32(6): 1109–1121. [20] Weninger F, Erdogan H, Watanabe S, et al. Speech en-
[6] Jensen S H, Hansen P C, Hansen S D, et al. Reduction of hancement with LSTM recurrent neural networks and its
broad-band noise in speech by truncated QSVD[J]. IEEE application to noise-robust ASR[C]. International Confer-
Transactions on Speech and Audio Processing, 1995, 3(6): ence on Latent Variable Analysis and Signal Separation,
439–448. 2015: 91–99.
[7] Wang D L, Chen J. Supervised speech separation based
[21] Tan K, Wang D L. A convolutional recurrent neural net-
on deep learning: an overview[J]. IEEE/ACM Transac- work for real-time speech enhancement[C]. Interspeech,
tions on Audio, Speech, and Language Processing, 2018, 2018: 3229–3233.
26(10): 1702–1726.
[22] Pascanu R, Mikolov T, Bengio Y. On the difficulty of
[8] Shivakumar P G, Georgiou P G. Perception optimized
training recurrent neural networks[C]. In Proceedings of
deep denoising autoencoders for speech enhancement[C].
International Conference on Machine Learning (ICML),
Interspeech, 2016: 3743–3747.
2013: 1310–1318.
[9] Xu Z, Elshamy S, Fingscheidt T. Using separate losses for
[23] Weninger F, Hershey J R, Le Roux J, et al. Discrimina-
speech and noise in mask-based speech enhancement[C].
tively trained recurrent neural networks for single-channel
IEEE International Conference on Acoustics, Speech and
speech separation[C]. IEEE Global Conference on Signal
Signal Processing (ICASSP), 2020: 7519–7523.
and Information Processing (GlobalSIP), 2014: 577–581.
[10] Li A, Peng R, Zheng C, et al. A supervised speech en-
[24] Itakura F, Saito S. Analysis synthesis telephony based
hancement approach with residual noise control for voice
on the maximum likelihood method[C]. International
communication[J]. Applied Sciences, 2020, 10(8): 2894.
Congress on Acoustics, 1968: 280–292.
[11] Xia B, Bao C. Speech enhancement with weighted denois-
[25] Gray A, Markel J. Distance measures for speech process-
ing auto-encoder[C]. Interspeech, 2013: 3444–3448.
ing[J]. IEEE Transactions on Acoustics, Speech, and Sig-
[12] Kumar A, Florencio D. Speech enhancement in multiple-
nal Processing, 1976, 24(5): 380–391.
noise conditions using deep neural networks[C]. Inter-
[26] Shikano K, Sugiyama M. Evaluation of LPC spectral
speech, 2016: 3738–3742.
matching measures for spoken word recognition[J]. Trans-
[13] Liu Q, Wang W, Jackson P J B, et al. A perceptually-
actions on IECE, 1982, 565(5): 535–541.
weighted deep neural network for monaural speech en-
hancement in various background noise conditions[C]. Eu- [27] Zue V, Seneff S, Glass J. Speech database development
ropean Signal Processing Conference (EUSIPCO), 2017: at MIT: TIMIT and beyond[J]. Speech Communication,
1990, 9(4): 351–356.
1270–1274.
[14] Rix A W, Beerends J G, Hollier M P, et al. Percep- [28] Hu G, Wang D L. A tandem algorithm for pitch estima-
tual evaluation of speech quality (PESQ)—A new method tion and voiced speech segregation[J]. IEEE Transactions
for speech quality assessment of telephone networks and on Audio, Speech, and Language Processing, 2010, 18(8):
codecs[C]. IEEE International Conference on Acoustics, 2067–2079.
Speech, and Signal Processing (ICASSP), 2001: 749–752. [29] Xu Y, Du J, Huang Z, et al. Multi-objective learning and
[15] Taal C H, Hendriks R C, Heusdens R, et al. An algorithm mask-based post-processing for deep neural network based
for intelligibility prediction of time–frequency weighted speech enhancement[C]. Interspeech, 2015: 1508–1512.
noisy speech[J]. IEEE Transactions on Audio, Speech, and [30] Varga A, Steeneken H J M. Assessment for automatic
Language Processing, 2011, 19(7): 2125–2136. speech recognition: II. NOISEX-92: a database and
[16] Kolbæk M, Tan Z H, Jensen J. On the relationship be- an experiment to study the effect of additive noise on
tween short-time objective intelligibility and short-time speech recognition systems[J]. Speech Communication,
spectral-amplitude mean-square error for speech enhance- 1993, 12(3): 247–251.
ment[J]. IEEE/ACM Transactions on Audio, Speech, and [31] Hu Y, Loizou P C. Evaluation of objective quality mea-
Language Processing, 2018, 27(2): 283–295. sures for speech enhancement[J]. IEEE Transactions on
[17] Martin-Donas J M, Gomez A M, Gonzalez J A, et al. A Audio, Speech, and Language Processing, 2008, 16(1):
deep learning loss function based on the perceptual eval- 229–238.