Page 206 - 《应用声学)》2023年第5期
P. 206
1098 2023 年 9 月
[2] Cao W H, Xu J P, Liu Z T. Speaker-independent speech Hu Songfeng, Zhang Xuan. Speaker recognition method
emotion recognition based on random forest feature selec- based on Mel frequency cepstrum coefficient and inverted
tion algorithm[C]. 2017 36th Chinese Control Conference, Mel frequency cepstrum coefficient[J]. Journal of Com-
2017: 10995–10998. puter Application, 2012, 32(9): 2542–2544.
[3] Cao Q, Hou M, Chen B, et al. Hierarchical network based [14] Tang Y Y, Lu Y, Yuan H. Hyperspectral image classifica-
on the fusion of static and dynamic features for speech tion based on three-dimensional scattering wavelet trans-
emotion recognition[C]. ICASSP 2021-2021 IEEE Inter- form[J]. IEEE Transactions on Geoscience and Remote
national Conference on Acoustics, Speech and Signal Pro- sensing, 2014, 53(5): 2467–2480.
cessing, 2021: 6334–6338. [15] 钟浩, 鲍鸿, 张晶. 一种改进的语音动态组合特征参数提取方
[4] Zhang S, Chen A, Guo W, et al. Learning deep binaural 法 [J]. 电脑与信息技术, 2017, 25(3): 4–7.
representations with deep convolutional neural networks Zhong Hao, Bao Hong, Zhang Jing. An improved extrac-
for spontaneous speech emotion recognition[J]. IEEE Ac- tion method of speech dynamic combination characteris-
cess, 2020, 8: 23496–23505. tic parameters[J]. Computer and Information Technology,
[5] Baltrušaitis T, Ahuja C, Morency L P. Multimodal ma-
2017, 25(3): 4–7.
chine learning: a survey and taxonomy[J]. IEEE Transac-
[16] Stolar M N, Lech M, Bolia R S, et al. Real time speech
tions on Pattern Analysis and Machine Intelligence, 2018,
emotion recognition using RGB image classification and
41(2): 423–443.
transfer learning[C]. 2017 11th International Conference
[6] Noh K, Lim J, Chung S, et al. Ensemble classifier based
on Signal Processing and Communication Systems, 2017:
on decision-fusion of multiple models for speech emotion
1–8.
recognition[C]. 2018 International Conference on Informa-
[17] Lin T Y, Goyal P, Girshick R, et al. Focal loss for dense
tion and Communication Technology Convergence, 2018:
object detection[C]. Proceedings of the IEEE Interna-
1246–1248.
tional Conference on Computer Vision, 2017: 2980–2988.
[7] Yao Z, Wang Z, Liu W, et al. Speech emotion recognition
[18] Stevens S S, Volkmann J. The relation of pitch to fre-
using fusion of three multi-task learning-based classifiers:
quency: a revised scale[J]. The American Journal of Psy-
HSF-DNN, MS-CNN and LLD-RNN[J]. Speech Commu-
chology, 1940, 53(3): 329–353.
nication, 2020, 120: 11–19.
[19] Bruna J, Mallat S. Invariant scattering convolution net-
[8] Bahdanau D, Chorowski J, Serdyuk D, et al. End-to-end
works[J]. IEEE Transactions on Pattern Analysis and Ma-
attention-based large vocabulary speech recognition[C].
chine Intelligence, 2013, 35(8): 1872–1886.
2016 IEEE International Conference on Acoustics, Speech
[20] Pascanu R, Mikolov T, Bengio Y. On the difficulty of
and Signal Processing, 2016: 4945–4949.
training recurrent neural networks[C]. International Con-
[9] Mirsamadi S, Barsoum E, Zhang C. Automatic speech
ference on Machine Learning, 2013: 1310–1318.
emotion recognition using recurrent neural networks
[21] Prabowo Y D, Warnars H L H S, Budiharto W, et
with local attention[C]. 2017 IEEE International Confer-
al. LSTM and simple RNN comparison in the problem
ence on Acoustics, Speech and Signal Processing, 2017:
of sequence to sequence on conversation data using ba-
2227–2231.
[10] Kwon S. Att-Net: enhanced emotion recognition system hasa indonesia[C]. 2018 Indonesian Association for Pat-
using lightweight self-attention module[J]. Applied Soft tern Recognition International Conference, 2018: 51–56.
Computing, 2021, 102: 107101. [22] 卢官明, 袁亮, 杨文娟, 等. 基于长短期记忆和卷积神经网络
[11] Nwe T L, Foo S W, de Silva L C. Speech emotion recog- 的语音情感识别 [J]. 南京邮电大学学报 (自然科学版), 2018,
nition using hidden Markov models[J]. Speech Communi- 38(5): 63–69.
cation, 2003, 41(4): 603–623. Lu Guanming, Yuan Liang, Yang Wenjuan, et al. Speech
[12] Kishore K V K, Satish P K. Emotion recognition emotion recognition based on long short-term memory and
in speech using MFCC and wavelet features[C]. 2013 convolutional neural networks[J]. Journal of Nanjing Uni-
3rd IEEE International Advance Computing Conference, versity of Posts and Telecommunications (Natural Science
2013: 842–847. Edition), 2018, 38(5): 63–69.
[13] 胡峰松, 张璇. 基于梅尔频率倒谱系数与翻转梅尔频率 [23] Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C].
倒谱系数的说话人识别方法 [J]. 计算机应用, 2012, 32(9): Proceedings of the IEEE Conference on Computer Vision
2542–2544. and Pattern Recognition, 2018: 7132–7141.