Page 43 - 201901
P. 43

第 38 卷 第 1 期                                                                       Vol. 38, No. 1
             2019 年 1 月                          Journal of Applied Acoustics                   January, 2019


             ⋄ 研究报告 ⋄



                      连续音素的改进深信度网络的识别算法                                                          ∗





                                            阴法明     1†    赵 焱    2    赵 力     2


                                           (1 南京信息职业技术学院通信学院         南京   210023)

                                             (2 东南大学信息科学工程学院        南京  210096)
                摘要    为提高连续语音识别中的音素识别率,提出一种基于改进并行回火训练的受限波尔兹曼机的音素识别
                算法。首先,利用经过等能量划分后的改进并行回火算法来训练受限玻尔兹曼机,接着将受限玻尔兹曼机堆叠
                组成一个深信度网络,从而作为深度神经网络预训练的基础模型,然后通过 softmax 层输出,得到用于音素状
                态后验概率检测的深度神经网络。接着,利用少量的标签数据,根据反向传播算法对网络权重进行微调。最后,
                将所得后验概率作为隐马尔科夫的发射概率,然后利用 Viterbi 解码器实现音素识别。在 TIMIT 语料库上的
                实验表明,识别率相比于传统的对比散度类算法提高了约 4.5%,在不增加计算量的情况下比原始并行回火算
                法提高约 1%。
                关键词     并行回火,受限玻尔兹曼机,深信度网络,音素识别
                中图法分类号: TP18           文献标识码: A          文章编号: 1000-310X(2019)01-0039-06
                DOI: 10.11684/j.issn.1000-310X.2019.01.006




                               Phoneme recognition based on deep belief network



                                          YIN Faming 1   ZHAO Yan  2  ZHAO Li  2


                                  (1 Nanjing College of Information Technology, Nanjing 210023, China)
                         (2 School of Information Science and Engineering, Southeast University, Nanjing 210096, China)

                 Abstract  In order to improve the accuracy of phoneme recognition in continuous speech recognition, in
                 this paper, a modified parallel tempering (PT) algorithm applied to train the restricted Boltzmann machine
                 (RBM) is proposed. Firstly, RBM is trained in light of Metropolis-Hasting for parallel tempering sampling, then
                 stacking up RBMs to form a deep belief network (DBN) as the basis for deep neural network (DNN) pre-training,
                 then by adding an output layer called “softmax” to the network, a DNN detecting the posterior probability
                 of phoneme can be created. Subsequently, backward propagation algorithm is applied to fine-tune the weights
                 discriminatively with less label data. Finally, the sequence of the predicted probability distribution is fed into
                 a standard Viterbi decoder. The experiments show that the proposed method has a better performance on the
                 TIMIT dataset than traditional ways. Its recognition rate is higher 4.5% than contrastive divergence (CD),
                 and 1% than original PT without more computation.
                 Key words Parallel tempering, Restricted Boltzmann machine, Deep belief network, Phoneme recognition


             2018-04-25 收稿; 2018-08-02 定稿
             国家自然科学基金项目 (61571106)
             ∗
             作者简介: 阴法明 (1980- ), 男, 江苏南京人, 硕士研究生, 研究方向: 电子与信息处理。
             † 通讯作者 E-mail: yinfm@njcit.cn
   38   39   40   41   42   43   44   45   46   47   48