Page 145 - 《应用声学》2024年第1期
P. 145

第 43 卷 第 1 期               周峻林等: 合成语声的声学分析及识别特征算法                                           141


                 verification[J]. Computer Speech & Language, 2017, 45:  [16] Zhou J, Hu X, Ma Q. A study of the emotional in-
                 516–535.                                          formation acoustic characteristics of synthetic speech
              [9] Yang J, Das R K. Long-term high frequency features for  phoneme/ei[C]//International Conference on Electronic
                 synthetic speech detection[J]. Digital Signal Processing,  Information Engineering and Computer Communication.
                 2020, 97(C): 102622.                              SPIE, 2022, 12172: 170–178.
             [10] Das R K, Yang J, Li H. Assessing the scope of generalized  [17] Reimao R, Tzerpos V. For: a dataset for synthetic speech
                 countermeasures for anti-spoofing[C]//ICASSP 2020-2020  detection[C]//2019 International Conference on Speech
                 IEEE International Conference on Acoustics, Speech and  Technology and Human-Computer Dialogue. IEEE, 2019:
                 Signal Processing. IEEE, 2020: 6589–6593.         1–10.
             [11] Yang J, Das R K, Li H. Significance of subband features
                                                                [18] Liu S, Wu H, Lee H, et al.  Adversarial attacks on
                 for synthetic speech detection[J]. IEEE Transactions on
                                                                   spoofing countermeasures of automatic speaker verifica-
                 Information Forensics and Security, 2019, 15: 2160–2170.
                                                                   tion[C]//2019 IEEE Automatic Speech Recognition and
             [12] Laskowski K, Jin Q. Modeling instantaneous intonation
                                                                   Understanding Workshop. IEEE, 2019: 312–319.
                 for speaker identification using the fundamental frequency
                                                                [19] Dua M, Jain C, Kumar S. LSTM and CNN based en-
                 variation spectrum[C]//2009 IEEE International Confer-
                                                                   semble approach for spoof detection task in automatic
                 ence on Acoustics, Speech and Signal Processing. IEEE,
                                                                   speaker verification systems[J]. Journal of Ambient Intel-
                 2009: 4541–4544.
                                                                   ligence and Humanized Computing, 2021: 1–16.
             [13] Monisankha P, Dipjyoti P, Goutam S. Synthetic speech
                                                                [20] Alzantot M, Wang Z, Srivastava M B. Deep residual
                 detection using fundamental frequency variation and spec-
                                                                   neural networks for audio spoofing detection[J]. arXiv
                 tral features[J]. Computer Speech & Language, 2018, 48:
                                                                   Preprint, arXiv: 1907.00501, 2019.
                 31–50.
             [14] Dupuis K, Pichora-Fuller M K. Toronto emotional speech  [21] Wu Z, Das R K, Yang J, et al.  Light convolutional
                 set (TESS)-Younger talker_Angry[EB/OL]. [2010-06-21].  neural network with feature genuinization for detection
                 https://tspace.library.utoronto.ca/handle/1807/24490.  of synthetic speech attacks[J]. arXiv Preprint, arXiv:
             [15] Jia Y, Zhang Y, Weiss R, et al. Transfer learning from  2009.09637, 2020.
                 speaker verification to multispeaker text-to-speech synthe-  [22] Dongre V, Reddy A T, Reddeddy N. Adaptive re-
                 sis[C]. Advances in Neural Information Processing Sys-  calibration of channel-wise features for adversarial audio
                 tems, 2018, 31: 4485–4495.                        classification[J]. arXiv Preprint, arXiv: 2210.11722, 2022.
   140   141   142   143   144   145   146   147   148   149   150