文章摘要
方丛丛,金赟,赵力,马勇,李世党,顾煜.基于文本特征能量编码的多模态语声情感识别*[J].,2024,43(5):997-1007
基于文本特征能量编码的多模态语声情感识别*
Multimodal speech emotion recognition based on text feature energy encoding
投稿时间:2023-05-26  修订日期:2024-09-04
中文摘要:
      能量是情感表达重要的特征之一,说话时不同的文字有着各自的能量值,反映了说话者不同的情感状态。而把语声转录成文本的过程中,每个文字表达的能量信息并不包含在内,在提取文本特征的时候导致能量信息丢失。故对于文本模态,该文提出并设计了一种能量编码,将语声信号的每个词、每个停顿的能量值添加到转录文本中,使文本特征包含能量信息,并通过DC-BERT模型获取话语级文本特征。对于语声模态,利用OpenSMILE工具箱,提取语声中的浅层声学特征,采用随机森林算法,选取情感特征重要度靠前的1000维特征作为新的特征集。通过Transformer Encoder网络从新的特征集中提取深层特征,并将浅层特征和深层特征融合,形成多层次的语声情感特征。最后,利用基于自注意力机制的双向长短时记忆神经网络进行情感分类。结果表明,该文提出的方法在IEMOCAP四类情感分类中的加权准确率达到了76.49%。
英文摘要:
      Energy is one of the important characteristics of emotional expression. Different words have energy values when speaking, reflecting other emotional states of the speaker. In the process of transcription of speech into text, the energy information expressed by each text is not included, which leads to the loss of energy information when the text features are extracted. Therefore, for the text mode, this paper proposes and designs an energy coding, which adds the energy value of each word and each pause of the speech signal to the transcribed text so that the text features contain energy information and obtain the discourse level text features through the DC-BERT model. OpenSMILE toolbox was used for speech modes to extract shallow acoustic features in speech. Random Forest (RF) algorithm was adopted to select 1000-dimensional features with the highest importance of emotional features as the new feature set. In-depth features are extracted from new feature sets through the Transformer Encoder network, and shallow elements and in-depth features are fused to form multi-level voice emotion features. Finally, Bi-directional Long Short Term Memory-Attention (BiLSTM-ATT) neural network based on a self-attention mechanism is used to classify emotions. The results show that the weighted accuracy of the proposed method in the IEMOCAP classification reaches 76.49%.
DOI:10.11684/j.issn.1000-310X.2024.05.009
中文关键词: 多模态情感识别  能量编码  随机森林  特征融合  注意机制
英文关键词: Multimodal emotion recognition  Energy encoding  Random forest  Feature fusion  Attention mechanism
基金项目:江苏省高校自然科学基金
作者单位E-mail
方丛丛 江苏师范大学 1466605626@qq.com 
金赟* 江苏师范大学 jiny@jsnu.edu.cn 
赵力 东南大学 zhaoli@seu.edu.cn 
马勇 江苏师范大学 may@jsnu.edu.cn 
李世党 江苏师范大学 shidangli@jsnu.edu.cn 
顾煜 江苏师范大学 guyuluck666@163. com 
摘要点击次数: 159
全文下载次数: 118
查看全文   查看/发表评论  下载PDF阅读器
关闭