Page 80 - 《应用声学》2023年第1期
P. 80
第 42 卷 第 1 期 Vol. 42, No. 1
2023 年 1 月 Journal of Applied Acoustics January, 2023
⋄ 研究报告 ⋄
汉语儿童情感语声合成 ∗
胡航烨 王 蔚 †
(南京师范大学教育科学学院机器学习与认知实验室 南京 210097)
客观实验结果中梅尔倒谱失真指标为 4.91,主观听辨实验指标分别为 3.61 和 4.17。通过实验对比表明,该文的
中图法分类号: TP391 文献标识码: A 文章编号: 1000-310X(2023)01-0076-08
DOI: 10.11684/j.issn.1000-310X.2023.01.010
Affective speech synthesis of Chinese children
HU Hangye WANG Wei
(School of Educational Science, Nanjing Normal University, Nanjing 210097, China)
Abstract: Emotional speech synthesis technology is of great significance for human-computer interaction.
Facing the lack of Chinese speech data resources required for children’s emotional speech synthesis and the
long time of model training, this paper proposes a method of using transfer learning to realize Chinese children’s
emotional speech synthesis. This paper first implements the Chinese speech end-to-end synthesis model based
on the Chinese speech database training depth learning model, then uses the high-quality and large sample
Chinese emotional corpus to complete the emotional speech synthesis model, and finally uses the self sampled
small sample Chinese children’s emotional corpus to transfer the model to realize low resource speech synthesis.
The objective experimental results show that the Mel cepstrum distortion index is 4.91, and the subjective
auditory discrimination experimental indexes are 3.61 and 4.17 respectively. The experimental comparison
shows that the method in this paper has good performance in the application of emotional speech synthesis
technology, and is better than the existing advanced low resource emotional speech synthesis methods.
Keywords: Children; Emotion speech synthesis; Transfer learning; Low resource
2021-10-10 收稿; 2022-01-18 定稿
国家社会科学基金项目 (BCA150054)
作者简介: 胡航烨 (1996– ), 女, 浙江东阳人, 硕士研究生, 研究方向: 信号与信息处理。
† 通信作者 E-mail: