Page 145 - 《应用声学》2024年第1期
P. 145
第 43 卷 第 1 期 周峻林等: 合成语声的声学分析及识别特征算法 141
verification[J]. Computer Speech & Language, 2017, 45: [16] Zhou J, Hu X, Ma Q. A study of the emotional in-
516–535. formation acoustic characteristics of synthetic speech
[9] Yang J, Das R K. Long-term high frequency features for phoneme/ei[C]//International Conference on Electronic
synthetic speech detection[J]. Digital Signal Processing, Information Engineering and Computer Communication.
2020, 97(C): 102622. SPIE, 2022, 12172: 170–178.
[10] Das R K, Yang J, Li H. Assessing the scope of generalized [17] Reimao R, Tzerpos V. For: a dataset for synthetic speech
countermeasures for anti-spoofing[C]//ICASSP 2020-2020 detection[C]//2019 International Conference on Speech
IEEE International Conference on Acoustics, Speech and Technology and Human-Computer Dialogue. IEEE, 2019:
Signal Processing. IEEE, 2020: 6589–6593. 1–10.
[11] Yang J, Das R K, Li H. Significance of subband features
[18] Liu S, Wu H, Lee H, et al. Adversarial attacks on
for synthetic speech detection[J]. IEEE Transactions on
spoofing countermeasures of automatic speaker verifica-
Information Forensics and Security, 2019, 15: 2160–2170.
tion[C]//2019 IEEE Automatic Speech Recognition and
[12] Laskowski K, Jin Q. Modeling instantaneous intonation
Understanding Workshop. IEEE, 2019: 312–319.
for speaker identification using the fundamental frequency
[19] Dua M, Jain C, Kumar S. LSTM and CNN based en-
variation spectrum[C]//2009 IEEE International Confer-
semble approach for spoof detection task in automatic
ence on Acoustics, Speech and Signal Processing. IEEE,
speaker verification systems[J]. Journal of Ambient Intel-
2009: 4541–4544.
ligence and Humanized Computing, 2021: 1–16.
[13] Monisankha P, Dipjyoti P, Goutam S. Synthetic speech
[20] Alzantot M, Wang Z, Srivastava M B. Deep residual
detection using fundamental frequency variation and spec-
neural networks for audio spoofing detection[J]. arXiv
tral features[J]. Computer Speech & Language, 2018, 48:
Preprint, arXiv: 1907.00501, 2019.
31–50.
[14] Dupuis K, Pichora-Fuller M K. Toronto emotional speech [21] Wu Z, Das R K, Yang J, et al. Light convolutional
set (TESS)-Younger talker_Angry[EB/OL]. [2010-06-21]. neural network with feature genuinization for detection
https://tspace.library.utoronto.ca/handle/1807/24490. of synthetic speech attacks[J]. arXiv Preprint, arXiv:
[15] Jia Y, Zhang Y, Weiss R, et al. Transfer learning from 2009.09637, 2020.
speaker verification to multispeaker text-to-speech synthe- [22] Dongre V, Reddy A T, Reddeddy N. Adaptive re-
sis[C]. Advances in Neural Information Processing Sys- calibration of channel-wise features for adversarial audio
tems, 2018, 31: 4485–4495. classification[J]. arXiv Preprint, arXiv: 2210.11722, 2022.