Page 89 - 《应用声学》2025年第2期
P. 89
第 44 卷 第 2 期 蔡姗等: 短时傅里叶逆变换的苗语语声合成方法 349
Speech Communication Association, 2021: 141–145. [23] Kim J, Kong J, Son J. Conditional variational autoen-
[14] Subramani K, Valin J M, Isik U, et al. End-to- coder with adversarial learning for end-to-end text-to-
end LPCNet: A neural vocoder with fully-differentiable speech[C]//Proceedings of the 38th International Confer-
LPC estimation[C]//Proceedings of the interspeech, 2022: ence on Machine Learning. Proceedings of Machine Learn-
818–822. ing Research, 2021: 5530–5540.
[15] Jang W, Lim D, Yoon J, et al. UnivNet: A neural vocoder [24] Kaneko T, Tanaka K, Kameoka H, et al. iSTFTNet:
with multi-resolution spectrogram discriminators for high- Fast and lightweight mel-spectrogram vocoder incurporat-
fidelity waveform generation[C]//Proceedings of the Inter- ing inverse short-time Fourier transform[C]//Proceedings
speech, 2021: 2207–2211. of the International Conference on Acoustics, Speech and
[16] Chevi R, Prasojo R E, Aji A F, et al. NIX-TTS: Signal Processing, 2022: 6207–6211.
Lightweight and end-to-end text-to-speech via module- [25] Zhou X, Tian X, Lee G, et al. End-to-end code-switching
wise distillation[C]//Proceedings of the Spoken Language TTS with cross-lingual language model[C]//Proceedings
Technology Workshop. Doha, Qatar, 2023: 970–976. of the International Conference on Acoustics, Speech and
[17] Nguyen H K, Jeong K, Um S, et al. LiteTTS: A Signal Processing. Barcelona, Spain: IEEE Press, 2020:
lightweight mel-spectrogram-free text-to-wave synthesizer 7614–7618.
based on generative adversarial networks[C]//Proceedings [26] 李建文, 王咿卜. 函数拟合实现带声调的语音合成 [J]. 计算机
of the Interspeech, 2021: 3595–3599. 应用与软件, 2022, 39(9): 193–200.
[18] Cong J, Yang S, Xie L, et al. Glow-WaveGAN: Learn- Li Jianwen, Wang Yibo. Speech synthesis with tone by
ing speech representations from GAN-Based variational function fitting[J]. Computer Applications and Software,
auto encoder for high fidelity flow-based speech syn- 2022, 39(9): 193–200.
thesis[C]//Proceedings of the Annual Conference of the [27] Shen J, Pang R, Ron J, et al. Natural TTS synthe-
International Speech Communication Association, 2021: sis by conditioning wavenet on Mel spectrogram predic-
2182–2186. tions [C]//Proceedings of the International Conference on
[19] Ren Y, Hu C, Tan X, et al. Fastspeech 2: Fast and Acoustics, Speech and Signal Processing. Calgary, AB,
high-quality end-to-end text to speech[C]//Proceedings Canada: IEEE Press, 2018: 4779–4783.
of the International Conference on Learning Representa- [28] 王志超, 吴浩, 李栋, 等. 基于非自回归模型中文语音合成系
tions. Virtual Event, Austria: IEEE Press, 2020: 1–15. 统研究与实现 [J]. 计算机与数字工程, 2023, 51(2): 325–330,
[20] Lim D, Jung S, Kim E. JETS: Jointly training Fast- 335.
Speech2 and HiFi-GAN for end to end text to speech[J]. Wang Zhichao, Wu Hao, Li Dong, et al. Research and
arXiv preprint, arXiv: 2203.16852, 2022. implementation of Chinese speech synthesis system based
[21] Ron J, Skerry-Ryan R, Battenberg E, et al. Wave- on non-autoregressive model[J]. Computer & Digital En-
tacotron: Spectrogram-free end-to-end text-to-speech gineering, 2023, 51(2): 325–330, 335.
synthesis[C]//Proceedings of the International Confer- [29] Luo D, Sun S. On end-to-end chinese speech synthesis
ence on Acoustics, Speech and Signal Processing, 2021: based on world-tacotron[C]//Proceedings of the Interna-
5679–5683. tional Conference on Culture-oriented Science & Technol-
[22] Nguyen B, Cardinaux F, Uhlich S. Autotts: End-to-end ogy. IEEE, 2020: 538–542.
text-to-speech synthesis through differentiable duration [30] He T, Zhao W, Xu L. DOP-Tacotron: A fast Chinese TTS
modeling[C]//Proceedings of the International Confer- system with local-based attention[C]//Proceedings of the
ence on Acoustics, Speech and Signal Processing. Rhodes Chinese Control and Decision Conference. IEEE, 2020:
Island, Greece: IEEE Press, 2023: 1–5. 4345–4350.