《应用声学》编辑部

文章摘要

张威,翟明浩,黄子龙,李巍,曹毅.SE-MCNN-CTC的中文语音识别声学模型[J].,2020,39(2):231-235

SE-MCNN-CTC的中文语音识别声学模型

Towards end-to-end speech recognition for Chinese mandarin using SE-MCNN-CTC

投稿时间：2019-07-02 修订日期：2020-02-26

中文摘要:

为了解决传统卷积神经网络在识别中文语音时预测错误率较高，泛化性能弱的问题。首先以DCNN-CTC为研究对象，深入分析了不同卷积层、池化层以及全连接层的组合对其性能的影响；其次，在上述模型的基础上，提出了MCNN-CTC，并联合SENet提出了深度SE-MCNN-CTC声学模型，该模型融合了MCNN与SENET的优势，既能加强卷积神经网络的深层信息的传递、避免梯度问题，又可以对提取的特征图进行自适应重标定。最终实验结果表明：SE-MCNN-CTC相较于DCNN-CTC错误率相对降低13.51%，模型最终的错误率达22.21%；算法改进后的声学模型可以有效的提升泛化性能。

英文摘要:

In order to solve the problems of high prediction error rate and poor generalization performance with traditional Convolutional Neural Network in Chinese speech recognition, different convolutional layers, pooling layers and fully connected layers on DCNN-CTC are analyzed in this paper. Based on the above model, two kinds of acoustic models referred as MCNN-CTC and SE-MCNN-CTC are proposed, respectively. With the combination of the advantages of MCNN and SENet in the latter model, the deep information transmission is reinforced, and the gradient problems can be effectively avoided simultaneously, the extracted feature maps can be adaptively recalibrated. Compared with DCNN-CTC, the research results show that SE-MCNN-CTC not only yields a 13.51% relative PER reduction, and the final PER is 22.21%, but also the generalization performance of the improved acoustic model can be improved effectively.

DOI：10.11684/j.issn.1000-310X.2020.02.008

中文关键词: 深度学习，语音识别，声学模型，SE-MCNN-CTC

英文关键词: Deep Learning, Automatic Speech Recognition, Acoustic Model, SE-MCNN-CTC

基金项目:国家自然科学基金项目（51375209）,江苏省“六大人才高峰”计划项目(ZBZZ-012)，江苏省研究生创新计划项目(KYCX18_0630, KYCX18_1846)

作者	单位	E-mail
张威	江南大学	18261593885@163.com
翟明浩	江南大学	1355747741@qq.com
黄子龙	江南大学	1936482824@qq.com
李巍	苏州工业职业技术学院	414927240@qq.com
曹毅^*	江南大学	caoyi@jiangnan.edu.cn

摘要点击次数: 2298

全文下载次数: 2269

查看全文查看/发表评论下载PDF阅读器

关闭