张志浩,王坤侠.基于STA-CRNN模型的语声情感识别*[J].,2022,41(5):843-850 |
基于STA-CRNN模型的语声情感识别* |
Speech emotion recognition based on STA-CRNN model |
投稿时间:2022-03-15 修订日期:2022-09-02 |
中文摘要: |
语声情感识别对人机交互和情感计算研究领域具有重要作用,各类研究方法层出不穷。近期研究学者应用卷积神经网络和长短期记忆网络方法提取对数Mel谱图空间特征和时间特征,取得了一定的成果。然而不论是卷积神经网络还是长短期记忆网络提取特征时,都会产生特征冗余,导致语声情感识别效果下降。针对这一问题,该文提出了一种基于时空注意力机制的卷积-递归神经网络模型,采用对数Mel谱图和其一阶差分、二阶差分作为特征输入,在使用卷积神经网络提取空间特征和长短期记忆网络提取时间特征时,加入空间注意力和时间注意力机制,从而使上述网络能够更好地提取到对数Mel谱图中有效表征情感的空间特征和时间特征。该模型在Emo-DB和IEMOCAP语声数据集上的加权准确率分别达到86.8%、69.4%,未加权准确率分别达到84.7%、65.5%,优于当前大多数先进方法。 |
英文摘要: |
Speech emotion recognition (SER) plays an important role in the research fields of human-computer interaction and affective computing. Many new research methods have emerged. Recently, researchers applied convolutional neural network (CNN) and long short-term memory (LSTM) to extract spatial and temporal features from Log-Mel spectrum, and achieved better performance. However, when CNN and LSTM networks extract features, they will lead to feature redundancy and reduce the performance of speech emotion recognition. In this paper, we propose a convolution recursive neural network model based on spatiotemporal attention mechanism (STA-CRNN). The Log-Mel spectrum, its first-order difference and second-order difference are used as feature input. We extract spatial features by CNN and temporal features by LSTM, and adopt spatial attention and temporal attention mechanism to further decrease the redundancy of features. The experiment results show that the weighted accuracy (WA) of the model on Emo-DB and IEMOCAP Speech database are 86.8% and 69.4% respectively, and the unweighted accuracy (UA) are 84.7% and 65.5% respectively. The proposed model STA-CRNN achieves better performance than most advanced methods for SER. |
DOI:10.11684/j.issn.1000-310X.2022.05.021 |
中文关键词: 语声情感识别 对数Mel频谱图 时空注意力 时间特征 空间特征 |
英文关键词: Speech emotion recognition Log-Mel Spatiotemporal attention Time features Spatial features |
基金项目:国家自然科学基金项目(62001004), 安徽省高校学科(专业)拔尖人才学术资助项目(gxbjZD2021067), 安徽建筑大学科研发展基金项目(JZ202118), 安徽省高校自然科学研究重点项目(KJ2020A0470), 安徽建筑大学安徽省建筑声环境重点实验室开放课题资助(AAE2021ZR02) |
|
摘要点击次数: 612 |
全文下载次数: 607 |
查看全文
查看/发表评论 下载PDF阅读器 |
关闭 |