文章摘要
张威,翟明浩,黄子龙,李巍,曹毅.SE-MCNN-CTC的中文语音识别声学模型[J].,2020,39(2):231-235
SE-MCNN-CTC的中文语音识别声学模型
Towards end-to-end speech recognition for Chinese mandarin using SE-MCNN-CTC
投稿时间:2019-07-02  修订日期:2020-02-26
中文摘要:
      为了解决传统卷积神经网络在识别中文语音时预测错误率较高,泛化性能弱的问题。首先以DCNN-CTC为研究对象,深入分析了不同卷积层、池化层以及全连接层的组合对其性能的影响;其次,在上述模型的基础上,提出了MCNN-CTC,并联合SENet提出了深度SE-MCNN-CTC声学模型,该模型融合了MCNN与SENET的优势,既能加强卷积神经网络的深层信息的传递、避免梯度问题,又可以对提取的特征图进行自适应重标定。最终实验结果表明:SE-MCNN-CTC相较于DCNN-CTC错误率相对降低13.51%,模型最终的错误率达22.21%;算法改进后的声学模型可以有效的提升泛化性能。
英文摘要:
      In order to solve the problems of high prediction error rate and poor generalization performance with traditional Convolutional Neural Network in Chinese speech recognition, different convolutional layers, pooling layers and fully connected layers on DCNN-CTC are analyzed in this paper. Based on the above model, two kinds of acoustic models referred as MCNN-CTC and SE-MCNN-CTC are proposed, respectively. With the combination of the advantages of MCNN and SENet in the latter model, the deep information transmission is reinforced, and the gradient problems can be effectively avoided simultaneously, the extracted feature maps can be adaptively recalibrated. Compared with DCNN-CTC, the research results show that SE-MCNN-CTC not only yields a 13.51% relative PER reduction, and the final PER is 22.21%, but also the generalization performance of the improved acoustic model can be improved effectively.
DOI:10.11684/j.issn.1000-310X.2020.02.008
中文关键词: 深度学习,语音识别,声学模型,SE-MCNN-CTC
英文关键词: Deep Learning, Automatic Speech Recognition, Acoustic Model, SE-MCNN-CTC
基金项目:国家自然科学基金项目(51375209),江苏省“六大人才高峰”计划项目(ZBZZ-012),江苏省研究生创新计划项目(KYCX18_0630, KYCX18_1846)
作者单位E-mail
张威 江南大学 18261593885@163.com 
翟明浩 江南大学 1355747741@qq.com 
黄子龙 江南大学 1936482824@qq.com 
李巍 苏州工业职业技术学院 414927240@qq.com 
曹毅* 江南大学 caoyi@jiangnan.edu.cn 
摘要点击次数: 1739
全文下载次数: 1328
查看全文   查看/发表评论  下载PDF阅读器
关闭