冀常鹏,佟婷婷,代巍.融合注意力机制轻量级网络的语音情感识别*[J].,2024,43(4):892-899 |
融合注意力机制轻量级网络的语音情感识别* |
Speech Emotion Recognition with Lightweight Networks Incorporating Attention Mechanisms |
投稿时间:2023-03-24 修订日期:2024-07-02 |
中文摘要: |
在语音情感识别过程中,为解决缺乏方言数据库、识别模型准确率低等问题,建立辽西方言语音情感数据库,并提出一种融合注意力机制轻量级网络的语音情感识别模型。模型由特征组合网络、CBAM注意力机制、深度卷积网络及输出层四部分组成。利用三个大小不同的并行卷积提取浅层语音特征并进行拼接;引入CBAM注意力模块将空间特征与通道特征融合;融合后的特征输入深度卷积网络,提取语音深层次特征,输出多维特征向量;输出层对语音进行情感分类识别。模型在IEMOCAP、Emo-DB和自建辽西语音情感数据库上验证,分别取得82.5%、96.2%和90.8%的准确率。实验结果表明,与其他深度学习的模型相比,本文提出的模型在参数量更少的同时识别率更高。 |
英文摘要: |
In the process of speech emotion recognition, to solve the problems of lack of dialect database and low accuracy of recognition model, a speech emotion database of Liaoxi dialect was established, and a speech emotion recognition model integrating attention mechanism lightweight network was proposed. The model consists of four parts: feature combination network, CBAM attention mechanism, deep convolutional network, and output layer. Three parallel convolutions with different sizes are used to extract the shallow speech features and concatenate them. The CBAM attention module is introduced to refine the input features. The fused feature input deep convolutional network extracts the deep feature of speech and outputs the multi-dimensional feature vector ; The output layer classifies and recognizes speech emotion. The model was verified on IEMOCAP, Emo-DB, and Liaoxi dialect speech emotion database, and the accuracy rates were 82.5%, 96.2%, and 90.8%, respectively. Experimental results show that compared with other deep learning models, the proposed model has fewer parameters and a higher recognition rate. |
DOI:10.11684/j.issn.1000-310X.2024.04.022 |
中文关键词: 语音情感识别 辽西方言 深度学习 轻量级 |
英文关键词: speech emotion recognition Western Liaoning dialect Deep learning lightweight |
基金项目:辽宁省科技厅项目(2019-ZD-0038) |
|
摘要点击次数: 210 |
全文下载次数: 150 |
查看全文
查看/发表评论 下载PDF阅读器 |
关闭 |