Page 72 - 《应用声学》2020年第2期
P. 72
230 2020 年 3 月
[10] Graves A, Fernández S, Gomez F. Connectionist temporal [22] Zhang Y, Pezeshki M, Brakel P, et al. Towards end-
classification: labelling unsegmented sequence data with to-end speech recognition with deep convolutional neural
recurrent neural networks[C]// International Conference networks[J]. arXiv: 1701.02720, 2017.
on Machine Learning. Pittsburgh, 2006: 369–376. [23] Hu J, Li S, Samuel A, et al. Squeeze-and-excitation net-
[11] Graves A. Sequence transduction with recurrent neural works[J]. arXiv: 1709.01507, 2018.
networks[J]. Computer Science, 2012, 58(3): 235–242. [24] 张顺, 龚怡宏, 王进军. 深度卷积神经网络的发展及其在计算
[12] Kim S, Hori T, Watanabe S. Joint CTC-attention based 机视觉领域的应用 [J]. 计算机学报, 2019, 42(3): 453–482.
end-to-end speech recognition using multi-task learn- Zhang Shun, Gong Yihong, Wang Jinjun. Development of
ing[J]. arXiv: 1609.06773, 2017. deep convolutional neural networks and its application in
[13] 于重重, 陈运兵, 孙沁瑶, 等. 基于动态 BLSTM 和 CTC 的 computer vision[J]. Chinese Journal of Computers, 2019,
濒危语言语音识别研究 [J]. 计算机应用研究, 2019, 36(11): 42(3): 453–482.
3334–3337. [25] 吴仁彪, 赵婷, 屈景怡. 基于深度 SE-DenseNet 的航班延误预
Yu Chongchong, Chen Yunbing, Sun Qinyao, et al. Re- 测模型 [J]. 电子与信息学报, 2019, 41(6): 1510–1517.
search on endangered language speech recognition based Wu Renbiao, Zhao Ting, Qu Jingyi. Flight delay pre-
on dynamic BLSTM and CTC[J]. Application Research diction model based on deep SE-DenseNet[J]. Journal
of Computers, 2019, 36(11): 3334–3337. of Electronics and Information Technology, 2019, 41(6):
[14] 姚煜, Ryad Chellali. 基于双向长短时记忆 -联结时序分类和 1510–1517.
加权有限状态转换器的端到端中文语音识别系统 [J]. 计算机 [26] 仇利克, 郭忠文, 刘青, 等. 基于冗余分析的特征选择算法 [J].
应用, 2018, 38(9): 2495–2499. 北京邮电大学学报, 2017, 40(1): 36–41.
Yao Yu, Ryad C. End-to-end Chinese speech recogni- Qiu Like, Guo Zhongwen, Liu Qing, et al. Feature selec-
tion system based on bidirectional long-term memory- tion algorithm based on redundancy analysis[J]. Journal
join timing classification and weighted finaite-state trans- of Beijing University of Posts and Telecommunications,
ducer[J]. Journal of Computer Applications, 2018, 38(9): 2017, 40(1): 36–41.
2495–2499. [27] Wang D, Zhang X. THCHS-30: a free chinese speech cor-
[15] 周飞燕, 金林鹏, 董军. 卷积神经网络研究综述 [J]. 计算机学 pus[J]. arXiv: 1512.01882, 2015.
报, 2017, 40(6): 1229–1251. [28] Li Jie, Zhang Heng, Cai Xinyuan, et al. Towards end-to-
Zhou Feiyan, Jin Linpeng, Dong Jun. A review of convolu- end speech recognition for Chinese mandarin using long
tional neural networks[J]. Chinese Journal of Computers, short-term memory recurrent neural networks[C]// Inter-
2017, 40(6): 1229–1251. Speech. Dresden, 2015: 615–3619.
[16] Karen S, Andrew Z. Very deep convolutional networks for [29] Kingma D, Ba J. Adam: a method for stochastic opti-
large-scale image recognition[J]. arXiv: 1409.1556, 2014. mization[J]. arXiv: 1412.6980, 2015.
[17] He K, Zhang X, Ren S, et al. Deep residual learning [30] Sergey I, Christian S. Batch normalization: accelerat-
for image recognition[C]// Computer Vision and Pattern ing deep network training by reducing internal covariate
Recognition, Las Vegas, 2016: 770–778. shift[J]. arXiv: 1502.03167, 2015.
[18] Huang G, Liu Z, Laurens V D M, et al. Densely con- [31] Srivastava N, Hinton G, Krizhevsky A, et al. Dropout:
nected convolutional networks[C]// Computer Vision and a simple way to prevent neural networks from over-
Pattern Recognition. Hawaii, 2017: 2261–2269. fitting[J]. Journal of Machine Learning Research, 2014,
[19] 王珂, 武军, 周天相, 等. 一种融合全局时空特征的 CNNs 15(1): 1929–1958.
动作识别方法 [J]. 华中科技大学学报 (自然科学版), 2018, [32] Tan T, Qian Y, Zhou Y, et al. Adaptive very deep convo-
46(12): 36–41. lutional residual network for noise robust speech recogni-
Wang Ke, Wu Jun, Zhou Tianxiang, et al. A CNNs mo- tion[J]. IEEE/ACM Transactions on Audio, Speech, and
tion recognition method based on global spatiotemporal Language Processing, 2018, 26(8): 1393–1405.
features[J]. Journal of Huazhong University of Science and [33] 杨洋, 汪毓铎. 基于改进卷积神经网络算法的语音识别 [J]. 应
Technology, 2018, 46(12): 36–41. 用声学, 2018, 37(6): 940–946.
[20] Abdel H O, Mohamed A R, Jiang H, et al. Applying Yang Yang, Wang Yuduo. Speech recognition based on
convolutional neural networks concepts to hybrid NN- improved convolutional neural network[J]. Journal of Ap-
HMM model for speech recognition[C]// IEEE Interna- plied Acoustics, 2018, 37(6): 940–946.
tional Conference on Acoustics, Speech and Signal Pro- [34] 张立民, 王彦哲, 张兵强, 等. 基于 CTC 准则的普通话识别及
cessing. Kyoto, 2012: 4277–4280. 改进 [J]. 计算机工程, 2019,45(6): 249–253, 266.
[21] Sainath T N, Mohamed A R, Kingsbury B, et al. Deep Zhang Limin, Wang Yanzhe, Zhang Bingqiang, et al.
convolutional neural networks for LVCSR[C]// IEEE In- Mandarin recognition and improvement based on CTC
ternational Conference on Acoustics, Speech and Signal criterion[J]. Computer Engineering, 2019, 45(6): 249–253,
Processing. Vancouver, 2013: 8614–8618. 266.