文章摘要
拉巴顿珠,珠杰,欧珠,尼玛.端到端的藏语语音合成方法*[J].,2023,42(2):324-332
端到端的藏语语音合成方法*
Research on Tibetan speech synthesis method based on end-to-end
投稿时间:2022-01-07  修订日期:2023-03-01
中文摘要:
      近年来,得益于计算机运算能力的提高和语音数据的不断积累,涌现出许多基于机器学习的语音处理新技术,其中基于深度神经网络算法,端到端的Tacotron2语音合成系统框架得到业界广泛的青睐。它是一个开源程序,简单易行,已成功地应用于多种语言和不同音色的语音合成。该文研究Tacotron2在藏语中的应用,取得了良好的实验结果。首先,通过自然语音采集、自动标注、声学分析等构建了一个中等规模(5500句)藏语卫藏方言的语音语料库,其中包括藏文音素转写、特殊符号处理和Mel谱等各项数据;其次,利用开源程序Tacotron2和上述语音库进行了藏语语音合成试验 ;最后,通过对合成语音和自然语音的偏差分析,和对合成语音自然度的主观评价,表明了基于端到端的藏语语音合成方法有效地减少合成语音的频谱蜕变,提升了合成语音的自然度。因此,基于“端到端”的Tacotron2合成框架在藏语语音合成中具有重要的应用价值,值得进一步研究和推广应用。
英文摘要:
      In recent years, thanks to the improvement of computer computing capability and the continuous accumulation of voice data, many new machine learning-based voice processing technologies have emerged, among which, based on the deep neural network algorithm, the end-to-end Tacotron2 voice synthesis system framework has been widely favored in the Speech engineering technology. Tacotron2 is an open source program, easy to run, and has been successfully applied to speech synthesis in multiple languages and in different tones. This paper studies the application of Tacotron2 in the Tibetan language and achieves good experimental results. First, a medium-scale database of Tibetan speech (5500 sentences) was constructed through natural speech collection, automatic annotation, and acoustic analysis. These include Tibetan phoneme transformation, digital recording and Mel-spectrum data. Then, the Tibetan speech synthesis test was performed using the open source program Tacotron2 and the above speech database. Last, by error analysis of synthetic Mel-spectrum, and a subjective evaluation of the naturalness of the synthetic speech, the result is shown, that the application of Tacotron2 in the Tibetan language synthesis is effectively reducing the spectral lose, and improving the naturalness of synthetic speech. Therefore, the "end-to-end" -based Tacotron2 synthesis framework has important applications in Tibetan speech synthesis, and deserves further research and promotion.
DOI:10.11684/j.issn.1000-310X.2023.02.015
中文关键词: 语音合成;藏语;字音转换;端到端  Tacotron2
英文关键词: speech synthesis  Tibetan  phoneme transformation  end-to-end
基金项目:
作者单位E-mail
拉巴顿珠* 西藏大学 zangye@163.com 
珠杰 西藏大学信息科学技术学院 西藏信息化省部共建协同创新中心  
欧珠 西藏大学信息科学技术学院 西藏信息化省部共建协同创新中心  
尼玛 西藏大学信息科学技术学院 西藏信息化省部共建协同创新中心  
摘要点击次数: 279
全文下载次数: 334
查看全文   查看/发表评论  下载PDF阅读器
关闭