Page 110 - 201806
P. 110

第 37 卷 第 6 期                                                                        Vol. 37, No.6
             2018 年 11 月                         Journal of Applied Acoustics                 November, 2018


             ⋄ 研究报告 ⋄



                       基于改进卷积神经网络算法的语音识别





                                                   杨 洋     †   汪毓铎


                                          (北京信息科技大学信息与通信工程学院           北京  100101)

                摘要    为了解决传统卷积神经网络识别连续语音数据时识别性能较差的问题,提出一种改进的卷积神经网络
                算法。该方法引入 Fisher 准则以及 L2 正则化约束,在反向传播调整参数阶段,既保证参数误差的最小化,又确
                保分类以后的样本类间分布较分散,类内分布较集中,同时保证网络权值具有合适的数量级以有效缓解过拟
                合问题;采用一种更符合生物神经元激活特性的新型 log 激活函数进行卷积神经网络的优化,进一步提高语音
                识别的正确率。在语音识别库 TIMIT 以及 THCHS30 上的实验结果表明,相较于传统卷积神经网络算法,该
                文提出的改进算法能较好地提高语音识别率,且泛化能力更强。
                关键词     语音识别,卷积神经网络,Fisher 准则,L2 正则化,log 激活函数
                中图法分类号: TN912.3           文献标识码: A          文章编号: 1000-310X(2018)06-0940-07
                DOI: 10.11684/j.issn.1000-310X.2018.06.016





                     Speech recognition based on improved convolutional neural network
                                                       algorithm



                                               YANG Yang     WANG Yuduo


                   (School of Information and Communication Engineering, Beijing Information Science and Technology University,
                                                    Beijing 100101, China)

                 Abstract  An improved convolutional neural network (CNN) algorithm is proposed to solve the problem of
                 poor recognition performance when the traditional CNN identifies continuous speech corpus. In this method,
                 Fisher criterion and L2 regularization constraint are introduced. In the phase of back propagation adjustment
                 parameters, it not only ensures the minimum of parameter errors, but also ensures that the distribution of
                 samples after classification is more scattered, and the distribution within class is more concentrated. At
                 the same time, the weights of the network are guaranteed to have the appropriate order of magnitude to
                 effectively alleviate the problem of over-fitting. In order to further improve the accuracy of speech recognition,
                 a new log activation function which is more consistent with the biological neuron is used to optimize the CNN.
                 Experiments on speech corpus TIMIT and THCHS30 show that compared with the traditional CNN algorithm,
                 the improved algorithm proposed in this paper can better improve the accuracy and the generalization ability.
                 Key words Speech recognition,Convolutional neural network, Fisher criterion, L2 regularization, log acti-
                 vation function


             2018-01-25 收稿; 2018-05-01 定稿
             作者简介: 杨洋 (1994- ), 女, 河南商丘人, 硕士研究生, 研究方向: 语音信号处理。
              通讯作者 E-mail: 18811536735@163.com
             †
   105   106   107   108   109   110   111   112   113   114   115