Page 188 - 《应用声学》2023年第3期
P. 188
626 2023 年 5 月
Processing, ICASSP ’98 (Cat. No.98CH36181), 1998, 1: [12] Nakashika T, Takashima R, Takiguchi T, et al. Voice con-
285–288. version in high-order eigen space using deep belief nets[C].
[2] Lal Srivastava B M, Vauquier N, Sahidullah M, et INTERSPEECH, 2013: 369–372.
al. Evaluating voice conversion-based privacy protection [13] Mohammadi S H, Kain A. Voice conversion using
against informed attackers[C]. ICASSP 2020 - 2020 IEEE deep neural networks with speaker-independent pre-
InternationalConference on Acoustics, Speech and Signal training[C]. 2014 IEEE Spoken Language Technology
Processing (ICASSP), 2020: 2802–2806. Workshop (SLT), 2014: 19–23.
[3] Veaux C, Yamagishi J, King S. Towards personalised syn- [14] Chen L, Raitio T, Valentini-Botinhao C, et al. DNN-based
thesised voices for individuals with vocal disabilities: voice stochastic post filter for HMM-based speech synthesis[C].
banking and reconstruction[C]. Proceedings of the Fourth INTERSPEECH, 2014: 1954–1958.
Workshop on Speech and Language Processing for As- [15] Hsu C, Hwang H, Wu Y, et al. Voice conversion
sistive Technologies. Grenoble, France: Association for from unaligned corpora using variational autoencod-
Computational Linguistics, 2013: 107–111. ing wasserstein generative adversarial networks[C]. Inter-
[4] Toda, T, Black A W, Tokuda K. Voice conversion based speech 2017, 2017: 3364–3368.
on maximum-likelihood estimation of spectral parameter [16] Nakashika T, Takiguchi T, Ariki Y. High-order sequence
trajectory[J]. IEEE Transactions on Audio, Speech, and modeling using speaker-dependent recurrent temporal re-
Language Processing, 2007, 15(8): 2222–2235. stricted Boltzmann machines for voice conversion[C]. Fif-
[5] Takamichi S, Toda T, Black A W, et al. Modula- teenth Annual Conference of The International Speech
tion spectrum-constrained trajectory training algorithm Communication Association, 2014: 2278–2282.
for GMM-based voice conversion[C]. Acoustics, Speech [17] Nakashika T, Takiguchi T, Ariki Y. Voice conversion using
and Signal Processing (ICASSP), 2015 IEEE International RNN pre-trained by recurrent temporal restricted Boltz-
Conference on, 2015: 4859–4863. mann machines[J]. IEEE/ACM Transactions on Audio,
[6] Helander E, Silén H, Virtanen T, et al. Voice conversion Speech and Language Processing, 2015, 23(3): 580–587.
using dynamic kernel partial least squares regression[J]. [18] Bengio Y, Simard P, Frasconi P. Learning long-term
IEEE Transactions on Audio, Speech, and Language Pro- dependencies with gradient descent is difficult[J]. IEEE
cessing, 2011, 20(3): 806–817. Transactions on Neural Networks, 1994, 5(2): 157–166.
[7] Wu Z, Virtanen T, Chng E S, et al. Exemplar-based [19] Sun L, Kang S, Li K, et al. Voice conversion using deep
sparse representation with residual compensation for voice bidirectional long short-term memory based recurrent
conversion[J]. IEEE/ACM Transactionson Audio, Speech neural networks[C]. 2015 IEEE International Conference
and Language Processing, 2014, 22(10): 1506–1521. on Acoustics, Speech and Signal Processing (ICASSP),
[8] Takashima R, Takiguchi T, Ariki Y. Exemplar-based voice 2015: 4869–4873.
conversion using sparse representation in noisy environ- [20] Hochreiter S, Schmidhuber J. Long short-term memory[J].
ments[J]. IEICE Transactions on Fundamentals of Elec- Neural Computation, 1997, 9(8): 1735–1780.
tronics, Communications and Computer Sciences, 2013, [21] Wang Y, Skerry-Ryan R J, Stanton D, et al. Tacotron:
96(10): 1946–1953. towards end-to-end speech synthesis[C]. Interspeech 2017,
[9] Erro D, Moreno A, Bonafonte A. Voice conversion based 2017: 4006–4010.
on weighted frequency warping[J]. IEEE Transactions on [22] Kawahara H. STRAIGHT, exploitation of the other as-
Audio, Speech, and Language Processing, 2010, 18(5): pect of VOCODER: perceptually isomorphic decomposi-
922–931. tion of speech sounds[J]. Acoustical Science and Technol-
[10] Tian X, Wu Z, Lee S W, et al. Correlation-based fre- ogy, 2006, 27(6): 349–353.
quency warping for voice conversion[C]. Chinese Spoken [23] Wu J, Huang D, Xie L, et al. Denoising recurrent neural
Language Processing (ISCSLP), 2014 9 th International network for deep bidirectional LSTM based voice conver-
Symposium on, 2014: 211–215. sion[C]. Interspeech 2017, 2017: 3379–3383.
[11] Takamichi S, Toda T, Black A W, et al. Modulation [24] Kominek J, Black A W. CMU ARCTIC databases for
spectrum-based post-filter for GMM-based voice conver- speech synthesis[J]. Citeseer, 2003.
sion[C]. Signal and Information Processing Association [25] Povey D, Ghoshal A, Boulianne G, et al. The Kaldi
Annual Summit and Conference (APSIPA), 2014 Asia- speech recognition toolkit[C]. IEEE 2011 Workshop on
Pacific, 2014: 1–4. Automatic Speech Recognition and Understanding, 2011.