文章摘要
王嘉文,高定国,索朗曲珍.藏语语音识别声学模型建模单元的研究*[J].,2025,44(2):405-412
藏语语音识别声学模型建模单元的研究*
Research on acoustic model modeling unit for tibetan speech recognition
投稿时间:2023-10-27  修订日期:2025-02-24
中文摘要:
      语音识别建模单元的选择是藏语语音识别任务中的关键问题,决定了语音识别声学模型的训练质量和识别准确性。针对藏语语音识别研究中多种建模单元在不同数据集上进行的实验,导致难以探寻合适建模单元进行藏语语音识别,使得相关科研成果难以相互支持的问题,本文提出了适用性更高同时识别效果更优秀的藏语语音识别声学模型建模单元。本文总结改进了4种建模单元,并在3种方言数据上进行了消融实验,分别训练了5种声学模型。实验结果表明,基于拉丁音素的建模单元适用于卫藏方言和康巴方言,基于拉丁音节的建模单元适用于安多方言,改进的基于注意力机制的深度卷积声学模型在安多方言上达到了最好的识别效果,测试集字错误率为14.67%。
英文摘要:
      The choice of speech recognition modeling units is a key issue in Tibetan speech recognition tasks, which determines the training quality and recognition accuracy of the speech recognition acoustic model. In view of the problem that the experiments of various modeling units in Tibetan speech recognition research on different data sets make it difficult to explore the suitable modeling units for Tibetan speech recognition, and make the related scientific research results difficult to support each other, this paper proposes a more applicable and better recognition effect Tibetan speech recognition acoustic model modeling unit. This paper summarizes and improves four modeling units, and conducts ablation experiments on three dialect data, and trains five acoustic models respectively. The experimental results show that the modeling unit based on Latin phonemes is suitable for Lhasa and Khams dialects, the modeling unit based on Latin syllables is suitable for Ambo dialects, and the improved deep convolutional acoustic model based on attention mechanism achieves the best recognition effect on Ambo dialects, with a character error rate of 14.67% on the test set.
DOI:10.11684/j.issn.1000-310X.2025.02.015
中文关键词: 藏语  语音识别  声学模型  建模单元
英文关键词: Tibetan  Speech recognition  Acoustic model  Modeling unit
基金项目:国家自然科学基金项目(62166038);四川省科技计划项目(2023YFQ0044);西藏大学研究生“高水平人才培养计划”项目(2021-GSP-S126)
作者单位E-mail
王嘉文 西藏大学 jwwang21@163.com 
高定国* 西藏大学 gdg@utibet.edu.cn 
索朗曲珍 西藏大学 3199323958@qq.com 
摘要点击次数: 18
全文下载次数: 15
查看全文   查看/发表评论  下载PDF阅读器
关闭