Page 144 - 《应用声学》2020年第3期
P. 144
第 39 卷 第 3 期 Vol. 39, No. 3
2020 年 5 月 Journal of Applied Acoustics May, 2020
⋄ 研究报告 ⋄
基于双向循环神经网络的汉语语音识别 ∗
李 鹏 杨元维 † 高贤君 杜李慧 周 意 蒋梦月 张净波
(长江大学地球科学学院 武汉 430100)
摘要:当前基于深度神经网络模型中,虽然其隐含层可设置多层,对复杂问题适应能力强,但每层之间的节点
连接是相互独立的,这种结构特性导致了在语音序列中无法利用上下文相关信息来提高识别效果,而传统的
循环神经网络虽然做出了改进,但是只能对上文信息进行利用。针对以上问题,该文采用可以同时利用语音序
列中上下文相关信息的双向循环神经网络模型与深度神经网络模型相结合,并应用于语音识别。构建具有 5
层隐含层的模型,其中第 3 层为双向循环神经网络结构,其他层采用深度神经网络结构。实验结果表明:加入
了双向循环神经网络结构的模型与其他模型相比,较好地提高了识别正确率;噪声对双向循环神经网络汉语
识别有重要影响,尤其是训练集和测试集附加噪声类型不同时,单一的含噪声语音的训练模型无法适应不同
噪声类型的语音识别;调整神经网络模型中隐含层神经元数量后,识别正确率并不是一直随着隐含层中神经
元数量的增加而增加,神经元数量数目增加到一定程度后正确率出现了降低的趋势。
关键词:语音识别;深度学习;深度神经网络;循环神经网络
中图法分类号: TN912.3 文献标识码: A 文章编号: 1000-310X(2020)03-0464-08
DOI: 10.11684/j.issn.1000-310X.2020.03.020
A study of Chinese speech recognition based on bidirectional recurrent
neural network
LI Peng YANG Yuanwei GAO Xianjun DU Lihui ZHOU Yi
JANG Mengyue ZHANG Jingbo
(College of Geosciences, Yangtze University, Wuhan 430100, China)
Abstract: Within deep neural network (DNN) models, the hidden layer can be set up multi-level, adaptable
to complicated problem, but the node connected between each layer is independent of each other, the structure
characteristics make it impossible to use contextual information in the speech sequence to improve the effect
of recognition, and while a traditional recurrent neural network (RNN) has made the improvement, but only
to use the above information. To solve the above problems, the bidirectional RNN (Bi-RNN) model and DNN
model were combined in this paper, which can simultaneously utilize the context-related information in speech
sequences, and apply them to speech recognition. A model with five hidden layers was constructed, in which
the third layer was Bi-RNN structure and the other layers were DNN structure. The experimental results show
that: compared with other models, the model with Bi-RNN structure improves the recognition accuracy. Noise
plays an important role in Bi-RNN Chinese language recognition. In particular, the training set and test set
have different types of additional noise. After adjusting the number of neurons in the hidden layer in the neural
network model, the recognition accuracy does not always increase with the increase of the number of neurons
in the hidden layer, but decreases after the number of neurons increases to a certain extent.
Keywords: Speech recognition; Deep learning; Deep neural network; Recurrent neural network
2019-03-19 收稿; 2019-11-28 定稿
湖北省教育厅科学研究计划资助项目 (Q20181317), 长江大学大学生创新创业基金项目 (2018012), 地理国情监测国家测绘地理信息局
∗
重点实验室开发基金项目 (2017NGCM07)
作者简介: 李鹏 (1997– ), 男, 山西晋城人, 本科在读, 研究方向: 语音识别。
† 通信作者 E-mail: yyw_08@whu.edu.cn