Page 188 - 《应用声学》2023年第3期
P. 188

626                                                                                  2023 年 5 月


                 Processing, ICASSP ’98 (Cat. No.98CH36181), 1998, 1:  [12] Nakashika T, Takashima R, Takiguchi T, et al. Voice con-
                 285–288.                                          version in high-order eigen space using deep belief nets[C].
              [2] Lal Srivastava B M, Vauquier N, Sahidullah M, et  INTERSPEECH, 2013: 369–372.
                 al. Evaluating voice conversion-based privacy protection  [13] Mohammadi S H, Kain A. Voice conversion using
                 against informed attackers[C]. ICASSP 2020 - 2020 IEEE  deep neural networks with speaker-independent pre-
                 InternationalConference on Acoustics, Speech and Signal  training[C]. 2014 IEEE Spoken Language Technology
                 Processing (ICASSP), 2020: 2802–2806.             Workshop (SLT), 2014: 19–23.
              [3] Veaux C, Yamagishi J, King S. Towards personalised syn-  [14] Chen L, Raitio T, Valentini-Botinhao C, et al. DNN-based
                 thesised voices for individuals with vocal disabilities: voice  stochastic post filter for HMM-based speech synthesis[C].
                 banking and reconstruction[C]. Proceedings of the Fourth  INTERSPEECH, 2014: 1954–1958.
                 Workshop on Speech and Language Processing for As-  [15] Hsu C, Hwang H, Wu Y, et al.  Voice conversion
                 sistive Technologies. Grenoble, France: Association for  from unaligned corpora using variational autoencod-
                 Computational Linguistics, 2013: 107–111.         ing wasserstein generative adversarial networks[C]. Inter-
              [4] Toda, T, Black A W, Tokuda K. Voice conversion based  speech 2017, 2017: 3364–3368.
                 on maximum-likelihood estimation of spectral parameter  [16] Nakashika T, Takiguchi T, Ariki Y. High-order sequence
                 trajectory[J]. IEEE Transactions on Audio, Speech, and  modeling using speaker-dependent recurrent temporal re-
                 Language Processing, 2007, 15(8): 2222–2235.      stricted Boltzmann machines for voice conversion[C]. Fif-
              [5] Takamichi S, Toda T, Black A W, et al.  Modula-  teenth Annual Conference of The International Speech
                 tion spectrum-constrained trajectory training algorithm  Communication Association, 2014: 2278–2282.
                 for GMM-based voice conversion[C]. Acoustics, Speech  [17] Nakashika T, Takiguchi T, Ariki Y. Voice conversion using
                 and Signal Processing (ICASSP), 2015 IEEE International  RNN pre-trained by recurrent temporal restricted Boltz-
                 Conference on, 2015: 4859–4863.                   mann machines[J]. IEEE/ACM Transactions on Audio,
              [6] Helander E, Silén H, Virtanen T, et al. Voice conversion  Speech and Language Processing, 2015, 23(3): 580–587.
                 using dynamic kernel partial least squares regression[J].  [18] Bengio Y, Simard P, Frasconi P. Learning long-term
                 IEEE Transactions on Audio, Speech, and Language Pro-  dependencies with gradient descent is difficult[J]. IEEE
                 cessing, 2011, 20(3): 806–817.                    Transactions on Neural Networks, 1994, 5(2): 157–166.
              [7] Wu Z, Virtanen T, Chng E S, et al.  Exemplar-based  [19] Sun L, Kang S, Li K, et al. Voice conversion using deep
                 sparse representation with residual compensation for voice  bidirectional long short-term memory based recurrent
                 conversion[J]. IEEE/ACM Transactionson Audio, Speech  neural networks[C]. 2015 IEEE International Conference
                 and Language Processing, 2014, 22(10): 1506–1521.  on Acoustics, Speech and Signal Processing (ICASSP),
              [8] Takashima R, Takiguchi T, Ariki Y. Exemplar-based voice  2015: 4869–4873.
                 conversion using sparse representation in noisy environ-  [20] Hochreiter S, Schmidhuber J. Long short-term memory[J].
                 ments[J]. IEICE Transactions on Fundamentals of Elec-  Neural Computation, 1997, 9(8): 1735–1780.
                 tronics, Communications and Computer Sciences, 2013,  [21] Wang Y, Skerry-Ryan R J, Stanton D, et al. Tacotron:
                 96(10): 1946–1953.                                towards end-to-end speech synthesis[C]. Interspeech 2017,
              [9] Erro D, Moreno A, Bonafonte A. Voice conversion based  2017: 4006–4010.
                 on weighted frequency warping[J]. IEEE Transactions on  [22] Kawahara H. STRAIGHT, exploitation of the other as-
                 Audio, Speech, and Language Processing, 2010, 18(5):  pect of VOCODER: perceptually isomorphic decomposi-
                 922–931.                                          tion of speech sounds[J]. Acoustical Science and Technol-
             [10] Tian X, Wu Z, Lee S W, et al. Correlation-based fre-  ogy, 2006, 27(6): 349–353.
                 quency warping for voice conversion[C]. Chinese Spoken  [23] Wu J, Huang D, Xie L, et al. Denoising recurrent neural
                 Language Processing (ISCSLP), 2014 9 th  International  network for deep bidirectional LSTM based voice conver-
                 Symposium on, 2014: 211–215.                      sion[C]. Interspeech 2017, 2017: 3379–3383.
             [11] Takamichi S, Toda T, Black A W, et al.  Modulation  [24] Kominek J, Black A W. CMU ARCTIC databases for
                 spectrum-based post-filter for GMM-based voice conver-  speech synthesis[J]. Citeseer, 2003.
                 sion[C]. Signal and Information Processing Association  [25] Povey D, Ghoshal A, Boulianne G, et al.  The Kaldi
                 Annual Summit and Conference (APSIPA), 2014 Asia-  speech recognition toolkit[C]. IEEE 2011 Workshop on
                 Pacific, 2014: 1–4.                                Automatic Speech Recognition and Understanding, 2011.
   183   184   185   186   187   188   189   190   191   192   193