Page 91 - 应用声学2019年第2期
P. 91

第 38 卷 第 2 期                                                                       Vol. 38, No. 2
             2019 年 3 月                          Journal of Applied Acoustics                    March, 2019


             ⋄ 研究报告 ⋄



                      基于变分模态分解的语音情感识别方法                                                          ∗





                                                 王玮蔚     1   张秀再      1,2†


                                         (1 南京信息工程大学电子与信息工程学院           南京  210044)
                                        (2 江苏省大气环境与装备技术协同创新中心           南京   210044)
                摘要    针对传统语音情感特征参数在进行情感分类时性能不佳的问题,该文提出了一种基于变分模态分解的
                语音情感识别方法。情感语音信号首先由变分模态分解提取固有模态函数,然后对所选主导固有模态函数进
                行重新聚合,再提取梅尔倒谱系数和各固有模态函数的希尔伯特边际谱。为了验证该文提出的特征性能,选用
                两种语音数据库 (EMODB、RAVDESS) 进行实验,按该文方法提取特征后使用极限学习机进行语音情感分类
                识别。实验结果表明:相比基于经验模态分解和集合经验模态分解的语音情感特征,该文提出的特征有更好的
                识别性能,验证了该方法的实用性。
                关键词     变分模态分解,Mel 倒谱系数,希尔伯特谱,极限学习机
                中图法分类号: TN912.34           文献标识码: A          文章编号: 1000-310X(2019)02-0237-08
                DOI: 10.11684/j.issn.1000-310X.2019.02.013

                    Speech emotion recognition based on variational mode decomposition



                                            WANG Weiwei   1  ZHANG Xiuzai   1,2

                             (1 Nanjing University of Information Science and Technology, Nanjing 210044, China)
                   (2 Jiangsu Collaborative Innovation Center on Atmospheric Environment and Equipment Technology CICAEET,
                                                    Nanjing 210044, China)

                 Abstract  In view of the problem of poor performance of traditional speech emotion feature parameters in
                 emotion classification, this paper proposes a speech emotion recognition method based on variational mode
                 decomposition (VMD). The emotion speech signal is first extracted by the VMD into the intrinsic mode
                 functions (IMF), then the selected dominant IMFs are re-aggregated, after that the Mel frequency cepstral
                 coefficents (MFCC) and the Hilbert marginal spectrum of each IMF are extracted. In order to verify the
                 performance of the features proposed in this paper, two speech databases(EMODB、RAVDESS) are selected
                 for the experiment. After extracting features according to the method of this paper, the extreme learning
                 machine (ELM) is used for speech emotion classification and recognition. The experimental results show that
                 compared with the emotion features based on empirical mode decomposition (EMD) and ensemble empirical
                 mode decomposition (EEMD), the features proposed in this paper have better recognition performance, and
                 the practicability of the method is verified.
                 Key words Variational modal decomposition, Mel frequency cepstral coefficents, Hilbert marginal spectrum,
                 Extreme learning machine


             2018-07-26 收稿; 2018-10-15 定稿
             江苏省自然科学青年基金项目 (BK20141004), 国家自然科学青年基金项目 (11504176,61601230), 江苏高校优势学科建设工程资
             ∗
             助项目
             作者简介: 王玮蔚 (1993- ), 男, 江苏扬州人, 硕士研究生, 研究方向: 语音情感分析。
             † 通讯作者 E-mail: xz_zhang@nuist.edu.cn
   86   87   88   89   90   91   92   93   94   95   96