Page 89 - 《应用声学》2025年第2期
P. 89

第 44 卷 第 2 期               蔡姗等: 短时傅里叶逆变换的苗语语声合成方法                                           349


                 Speech Communication Association, 2021: 141–145.  [23] Kim J, Kong J, Son J. Conditional variational autoen-
             [14] Subramani K, Valin J M, Isik U, et al.  End-to-  coder with adversarial learning for end-to-end text-to-
                 end LPCNet: A neural vocoder with fully-differentiable  speech[C]//Proceedings of the 38th International Confer-
                 LPC estimation[C]//Proceedings of the interspeech, 2022:  ence on Machine Learning. Proceedings of Machine Learn-
                 818–822.                                          ing Research, 2021: 5530–5540.
             [15] Jang W, Lim D, Yoon J, et al. UnivNet: A neural vocoder  [24] Kaneko T, Tanaka K, Kameoka H, et al.  iSTFTNet:
                 with multi-resolution spectrogram discriminators for high-  Fast and lightweight mel-spectrogram vocoder incurporat-
                 fidelity waveform generation[C]//Proceedings of the Inter-  ing inverse short-time Fourier transform[C]//Proceedings
                 speech, 2021: 2207–2211.                          of the International Conference on Acoustics, Speech and
             [16] Chevi R, Prasojo R E, Aji A F, et al.  NIX-TTS:  Signal Processing, 2022: 6207–6211.
                 Lightweight and end-to-end text-to-speech via module-  [25] Zhou X, Tian X, Lee G, et al. End-to-end code-switching
                 wise distillation[C]//Proceedings of the Spoken Language  TTS with cross-lingual language model[C]//Proceedings
                 Technology Workshop. Doha, Qatar, 2023: 970–976.  of the International Conference on Acoustics, Speech and
             [17] Nguyen H K, Jeong K, Um S, et al.  LiteTTS: A    Signal Processing. Barcelona, Spain: IEEE Press, 2020:
                 lightweight mel-spectrogram-free text-to-wave synthesizer  7614–7618.
                 based on generative adversarial networks[C]//Proceedings  [26] 李建文, 王咿卜. 函数拟合实现带声调的语音合成 [J]. 计算机
                 of the Interspeech, 2021: 3595–3599.              应用与软件, 2022, 39(9): 193–200.
             [18] Cong J, Yang S, Xie L, et al. Glow-WaveGAN: Learn-  Li Jianwen, Wang Yibo. Speech synthesis with tone by
                 ing speech representations from GAN-Based variational  function fitting[J]. Computer Applications and Software,
                 auto encoder for high fidelity flow-based speech syn-  2022, 39(9): 193–200.
                 thesis[C]//Proceedings of the Annual Conference of the  [27] Shen J, Pang R, Ron J, et al.  Natural TTS synthe-
                 International Speech Communication Association, 2021:  sis by conditioning wavenet on Mel spectrogram predic-
                 2182–2186.                                        tions [C]//Proceedings of the International Conference on
             [19] Ren Y, Hu C, Tan X, et al.  Fastspeech 2: Fast and  Acoustics, Speech and Signal Processing. Calgary, AB,
                 high-quality end-to-end text to speech[C]//Proceedings  Canada: IEEE Press, 2018: 4779–4783.
                 of the International Conference on Learning Representa-  [28] 王志超, 吴浩, 李栋, 等. 基于非自回归模型中文语音合成系
                 tions. Virtual Event, Austria: IEEE Press, 2020: 1–15.  统研究与实现 [J]. 计算机与数字工程, 2023, 51(2): 325–330,
             [20] Lim D, Jung S, Kim E. JETS: Jointly training Fast-  335.
                 Speech2 and HiFi-GAN for end to end text to speech[J].  Wang Zhichao, Wu Hao, Li Dong, et al. Research and
                 arXiv preprint, arXiv: 2203.16852, 2022.          implementation of Chinese speech synthesis system based
             [21] Ron J, Skerry-Ryan R, Battenberg E, et al.  Wave-  on non-autoregressive model[J]. Computer & Digital En-
                 tacotron:  Spectrogram-free end-to-end text-to-speech  gineering, 2023, 51(2): 325–330, 335.
                 synthesis[C]//Proceedings of the International Confer-  [29] Luo D, Sun S. On end-to-end chinese speech synthesis
                 ence on Acoustics, Speech and Signal Processing, 2021:  based on world-tacotron[C]//Proceedings of the Interna-
                 5679–5683.                                        tional Conference on Culture-oriented Science & Technol-
             [22] Nguyen B, Cardinaux F, Uhlich S. Autotts: End-to-end  ogy. IEEE, 2020: 538–542.
                 text-to-speech synthesis through differentiable duration  [30] He T, Zhao W, Xu L. DOP-Tacotron: A fast Chinese TTS
                 modeling[C]//Proceedings of the International Confer-  system with local-based attention[C]//Proceedings of the
                 ence on Acoustics, Speech and Signal Processing. Rhodes  Chinese Control and Decision Conference. IEEE, 2020:
                 Island, Greece: IEEE Press, 2023: 1–5.            4345–4350.
   84   85   86   87   88   89   90   91   92   93   94