Page 87 - 《应用声学》2023年第1期
P. 87

第 42 卷 第 1 期                     胡航烨等: 汉语儿童情感语声合成                                            83


                 111–119.                                          Zhuang Fuzhen, Luo Ping, He Qing, et al. Survey on
                 Wang Guoliang, Chen Mengnan, Chen Lei. An end-to-end  transfer learning research[J]. Journal of Software, 2015,
                 Chinese speech synthesis scheme based on Tacotron 2[J].  26(1): 26–39.
                 Joural of East China Normal University(Natural Science),  [31] Gibiansky A, Arik S Ö, Diamos G F, et al. Deep voice 2:
                 2019(4): 111–119.                                 multi-speaker neural text-to-speech[C]//31th Conference
             [22] 张亚强. 基于迁移学习和自学习情感表征的情感语音合                        on Neural Information Processing Systems. NIPS. Long
                 成 [D]. 北京: 北京邮电大学, 2019                           Beach, 2017.
             [23] Skerry-Ryan R J, Battenberg E, Ying X, et al. Towards  [32] 都格草, 才让卓玛, 南措吉, 等. 基于神经网络的藏语语音合
                 end-to-end prosody transfer for expressive speech syn-  成 [J]. 中文信息学报, 2019, 33(2): 75–80.
                 thesis with tacotron[C]//International Conference on Ma-  Dou Gecao, Cai Rangzhuoma, Nan Cuoji, et al. Neu-
                 chine Learning. PMLR, 2018: 4693–4702.            ral network based tibetan speech synthesis[J]. Journal of
             [24] Cho K, Merrienboer B V, Gulcehre C, et al.  Learn-  Chinese Information Processing, 2019, 33(2): 75–80.
                 ing phrase representations using RNN encoder-decoder  [33] Wu X, Cao Y, Wang M, et al. Rapid style adaptation us-
                 for statistical machine translation[J]. Computer Science,  ing residual error embedding for expressive speech synthe-
                 2014, arXiv: 1406.1078.                           sis[C]//Interspeech 2018, 19th Annual Conference of the
             [25] Tits N, Haddad K E, Dutoit T. Exploring transfer learning  International Speech Communication Association, Hyder-
                 for low resource emotional TTS[C]//Proceedings of SAI  abad, India, 2018: 3072–3076.
                 Intelligent Systems Conference. Springer, Cham, 2019.  [34] Kubichek R. Mel-cepstral distance measure for objective
             [26] Zhou K, Sisman B, Liu R, et al. Emotional voice conver-  speech quality assessment[C]. In Communications, Com-
                 sion: theory, databases and ESD[J]. Speech Communica-  puters and Signal Processing, 1993, IEEE Pacific Rim
                 tion, 2022, 137: 1–18.                            Conference on IEEE, 19–21 May, 1993.
             [27] 应雨婷. 基于循环神经网络的中文语音合成研究与应用 [D].                [35] Yan C, Zhang G, Ji X, et al. The feasibility of inject-
                 南京: 东南大学, 2019.                                   ing inaudible voice commands to voice assistants[J]. IEEE
             [28] 曹欣怡. 基于韵律参数优化的情感语音合成 [D]. 南京: 南京                 Transactions on Dependable and Secure Computing, 2019,
                 师范大学, 2020.                                       18(3): 1108–1124.
             [29] Pan S J, Qiang Y. A survey on transfer learning[J]. IEEE  [36] 赵力, 黄程韦. 实用语音情感识别中的若干关键技术 [J]. 数据
                 Transactions on Knowledge and Data Engineering, 2010,  采集与处理, 2014, 29(2): 157–170.
                 22(10): 1345–1359.                                Zhao Li, Huang Chengwei. Key technologies in practical
             [30] 庄福振, 罗平, 何清, 等. 迁移学习研究进展 [J]. 软件学报,              speech emotion recognition[J]. Joural of Data Acqisition
                 2015, 26(1): 26–39.                               and Processing, 2014, 29(2): 157–170.
   82   83   84   85   86   87   88   89   90   91   92