Page 206 - 《应用声学)》2023年第5期
P. 206

1098                                                                                 2023 年 9 月


              [2] Cao W H, Xu J P, Liu Z T. Speaker-independent speech  Hu Songfeng, Zhang Xuan. Speaker recognition method
                 emotion recognition based on random forest feature selec-  based on Mel frequency cepstrum coefficient and inverted
                 tion algorithm[C]. 2017 36th Chinese Control Conference,  Mel frequency cepstrum coefficient[J]. Journal of Com-
                 2017: 10995–10998.                                puter Application, 2012, 32(9): 2542–2544.
              [3] Cao Q, Hou M, Chen B, et al. Hierarchical network based  [14] Tang Y Y, Lu Y, Yuan H. Hyperspectral image classifica-
                 on the fusion of static and dynamic features for speech  tion based on three-dimensional scattering wavelet trans-
                 emotion recognition[C]. ICASSP 2021-2021 IEEE Inter-  form[J]. IEEE Transactions on Geoscience and Remote
                 national Conference on Acoustics, Speech and Signal Pro-  sensing, 2014, 53(5): 2467–2480.
                 cessing, 2021: 6334–6338.                      [15] 钟浩, 鲍鸿, 张晶. 一种改进的语音动态组合特征参数提取方
              [4] Zhang S, Chen A, Guo W, et al. Learning deep binaural  法 [J]. 电脑与信息技术, 2017, 25(3): 4–7.
                 representations with deep convolutional neural networks  Zhong Hao, Bao Hong, Zhang Jing. An improved extrac-
                 for spontaneous speech emotion recognition[J]. IEEE Ac-  tion method of speech dynamic combination characteris-
                 cess, 2020, 8: 23496–23505.                       tic parameters[J]. Computer and Information Technology,
              [5] Baltrušaitis T, Ahuja C, Morency L P. Multimodal ma-
                                                                   2017, 25(3): 4–7.
                 chine learning: a survey and taxonomy[J]. IEEE Transac-
                                                                [16] Stolar M N, Lech M, Bolia R S, et al. Real time speech
                 tions on Pattern Analysis and Machine Intelligence, 2018,
                                                                   emotion recognition using RGB image classification and
                 41(2): 423–443.
                                                                   transfer learning[C]. 2017 11th International Conference
              [6] Noh K, Lim J, Chung S, et al. Ensemble classifier based
                                                                   on Signal Processing and Communication Systems, 2017:
                 on decision-fusion of multiple models for speech emotion
                                                                   1–8.
                 recognition[C]. 2018 International Conference on Informa-
                                                                [17] Lin T Y, Goyal P, Girshick R, et al. Focal loss for dense
                 tion and Communication Technology Convergence, 2018:
                                                                   object detection[C]. Proceedings of the IEEE Interna-
                 1246–1248.
                                                                   tional Conference on Computer Vision, 2017: 2980–2988.
              [7] Yao Z, Wang Z, Liu W, et al. Speech emotion recognition
                                                                [18] Stevens S S, Volkmann J. The relation of pitch to fre-
                 using fusion of three multi-task learning-based classifiers:
                                                                   quency: a revised scale[J]. The American Journal of Psy-
                 HSF-DNN, MS-CNN and LLD-RNN[J]. Speech Commu-
                                                                   chology, 1940, 53(3): 329–353.
                 nication, 2020, 120: 11–19.
                                                                [19] Bruna J, Mallat S. Invariant scattering convolution net-
              [8] Bahdanau D, Chorowski J, Serdyuk D, et al. End-to-end
                                                                   works[J]. IEEE Transactions on Pattern Analysis and Ma-
                 attention-based large vocabulary speech recognition[C].
                                                                   chine Intelligence, 2013, 35(8): 1872–1886.
                 2016 IEEE International Conference on Acoustics, Speech
                                                                [20] Pascanu R, Mikolov T, Bengio Y. On the difficulty of
                 and Signal Processing, 2016: 4945–4949.
                                                                   training recurrent neural networks[C]. International Con-
              [9] Mirsamadi S, Barsoum E, Zhang C. Automatic speech
                                                                   ference on Machine Learning, 2013: 1310–1318.
                 emotion recognition using recurrent neural networks
                                                                [21] Prabowo Y D, Warnars H L H S, Budiharto W, et
                 with local attention[C]. 2017 IEEE International Confer-
                                                                   al. LSTM and simple RNN comparison in the problem
                 ence on Acoustics, Speech and Signal Processing, 2017:
                                                                   of sequence to sequence on conversation data using ba-
                 2227–2231.
             [10] Kwon S. Att-Net: enhanced emotion recognition system  hasa indonesia[C]. 2018 Indonesian Association for Pat-
                 using lightweight self-attention module[J]. Applied Soft  tern Recognition International Conference, 2018: 51–56.
                 Computing, 2021, 102: 107101.                  [22] 卢官明, 袁亮, 杨文娟, 等. 基于长短期记忆和卷积神经网络
             [11] Nwe T L, Foo S W, de Silva L C. Speech emotion recog-  的语音情感识别 [J]. 南京邮电大学学报 (自然科学版), 2018,
                 nition using hidden Markov models[J]. Speech Communi-  38(5): 63–69.
                 cation, 2003, 41(4): 603–623.                     Lu Guanming, Yuan Liang, Yang Wenjuan, et al. Speech
             [12] Kishore K V K, Satish P K. Emotion recognition   emotion recognition based on long short-term memory and
                 in speech using MFCC and wavelet features[C]. 2013  convolutional neural networks[J]. Journal of Nanjing Uni-
                 3rd IEEE International Advance Computing Conference,  versity of Posts and Telecommunications (Natural Science
                 2013: 842–847.                                    Edition), 2018, 38(5): 63–69.
             [13] 胡峰松, 张璇.   基于梅尔频率倒谱系数与翻转梅尔频率                  [23] Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C].
                 倒谱系数的说话人识别方法 [J]. 计算机应用, 2012, 32(9):             Proceedings of the IEEE Conference on Computer Vision
                 2542–2544.                                        and Pattern Recognition, 2018: 7132–7141.
   201   202   203   204   205   206   207   208   209   210   211