Page 145 - 《应用声学》2025年第2期
P. 145

第 44 卷 第 2 期                                                                       Vol. 44, No. 2
             2025 年 3 月                          Journal of Applied Acoustics                    March, 2025

             ⋄ 研究论文 ⋄



                         藏语语声识别声学模型建模单元研究                                                      ∗





                                       王嘉文      1,2   高定国      1,2†  索朗曲珍       1,2


                                             (1 西藏大学信息科学技术学院        拉萨  850000)
                                      (2 西藏大学藏文信息技术创新人才培养示范基地             拉萨   850000)

                摘要:语声识别建模单元的选择是藏语语声识别任务中的关键问题,决定了语声识别声学模型的训练质量和
                识别准确性。针对藏语语声识别研究中多种建模单元在不同数据集上进行的实验,导致难以探寻合适建模单
                元进行藏语语声识别,使得相关科研成果难以相互支持的问题,该文提出了适用性更高同时识别效果更优秀
                的藏语语声识别声学模型建模单元。该文总结改进了 4 种建模单元,并在 3 种方言数据上进行了消融实验,分
                别训练了 5 种声学模型。实验结果表明,基于拉丁音素的建模单元适用于卫藏方言和康巴方言,基于拉丁音节
                的建模单元适用于安多方言,改进的基于注意力机制的深度卷积声学模型在安多方言上达到了最好的识别效
                果,测试集字错误率为 14.67%。
                关键词:藏语;语声识别;声学模型;建模单元
                中图法分类号: TN912.3           文献标识码: A          文章编号: 1000-310X(2025)02-0405-08
                DOI: 10.11684/j.issn.1000-310X.2025.02.015



                 Research on acoustic model modeling unit for Tibetan speech recognition


                                WANG Jiawen   1,2 , GAO Dingguo 1,2  and SUOLANG Quzhen 1,2

                           (1 School of Information Science and Technology, Tibet University, Lhasa 850000, China)
             (2 Tibetan Information Technology Innovative Talent Cultivation Demonstration Base, Tibet University, Lhasa 850000, China)

                 Abstract: The choice of speech recognition modeling units is a key issue in Tibetan speech recognition tasks,
                 which determines the training quality and recognition accuracy of the speech recognition acoustic model. In
                 view of the problem that the experiments of various modeling units in Tibetan speech recognition research
                 on different data sets make it difficult to explore the suitable modeling units for Tibetan speech recognition,
                 and make the related scientific research results difficult to support each other, this paper proposes a more
                 applicable and better recognition effect Tibetan speech recognition acoustic model modeling unit. This paper
                 summarizes and improves four modeling units, and conducts ablation experiments on three dialect data, and
                 trains five acoustic models respectively. The experimental results show that the modeling unit based on Latin
                 phonemes is suitable for Lhasa and Khams dialects, the modeling unit based on Latin syllables is suitable for
                 Ambo dialects, and the improved deep convolutional acoustic model based on attention mechanism achieves
                 the best recognition effect on Ambo dialects, with a character error rate of 14.67% on the test set.
                 Keywords: Tibetan; Speech recognition; Acoustic model; Modeling unit


             2023-10-27 收稿; 2023-12-06 定稿
             国家自然科学基金项目 (62166038), 四川省科技计划项目 (2023YFQ0044), 西藏大学研究生 “高水平人才培养计划” 项目 (2021-GSP-
             ∗
             S126)
             作者简介: 王嘉文 (1997– ), 男, 四川苍溪人, 硕士研究生, 研究方向: 语声识别与语声合成。
             † 通信作者 E-mail: gdg@utibet.edu.cn
   140   141   142   143   144   145   146   147   148   149   150