Page 58 - 《应用声学》2020年第2期
P. 58

第 39 卷 第 2 期                                                                       Vol. 39, No. 2
             2020 年 3 月                          Journal of Applied Acoustics                    March, 2020

             ⋄ 研究报告 ⋄



                             语音情感识别中的特征选择方法                                               ∗






                                      褚 钰     1†   李田港     1   叶 硕     1  叶光明      2


                                               (1  武汉邮电科学研究院      武汉  430000)
                                         (2  武汉烽火众智数字技术有限责任公司          武汉   430000)

                摘要:语音情感识别在许多领域具有重要研究价值,不同声学情感特征在使用不同分类器进行分类时,识别效
                果具有明显差异。与语音情感有关的声学特征包括谱特征、韵律学特征、音质特征。该文提出一种特征融合的
                方法,将 3 种声学特征中具有最好识别能力的特征进行融合:保留在实验中表现稳定且有较高识别率的谱特征
                的全部特征,提取韵律学、音质特征的相关统计量作为辅助特征融合于谱特征中。实验表明,该文所提出的融
                合特征在使用同一分类器进行分类时,识别率优于单一特征;当使用不同分类器时,融合特征依然具有较好的
                识别能力,且识别性能稳定,3 个数据集上均有较好的识别率,基本实现跨数据集识别。
                关键词:语音识别;情感识别;特征选择;特征融合
                中图法分类号: TP183           文献标识码: A          文章编号: 1000-310X(2020)02-0216-07
                DOI: 10.11684/j.issn.1000-310X.2020.02.007




                     Research on feature selection method in speech emotion recognition


                                   CHU Yu  1  LI Tiangang 1  YE Shuo 1  YE Guangming  2

                            (1  Wuhan Research Institute of Posts and Telecommunications, Wuhan 430000, China)
                             (2  Wuhan Fiberhome Wisdom Digital Technology Co. Ltd., Wuhan 430000, China)

                 Abstract: Speech emotion recognition is of great value in many fields. The recognition effect of different
                 emotion acoustic features is obviously different when different classifiers are used for classification. Acoustic
                 features related to speech emotions include spectral features, rhythmic features and quality features. This
                 paper proposes a method of feature fusion, which combines the features of the three acoustic features with the
                 best recognition ability: all the features of the spectral features that are stable in the experiment and have
                 a high recognition rate are retained, and the relevant statistics of the rhythmic features and quality features
                 are extracted as auxiliary features and integrated into the spectral features. Experiments show that the fusion
                 feature proposed in this paper is better than the single feature when using the same classifier for classification;
                 when using different classifiers, the fusion feature still has better recognition ability and stable recognition
                 performance. It has better recognition rate on three data sets and basically realizes cross-dataset recognition.
                 Keywords: Speech recognition; Emotion recognition; Feature selection; Feature fusion





             2019-05-06 收稿; 2019-09-25 定稿
             湖北省科技厅 2018 年度湖北省技术创新专项重大项目 (2018AAA063)
             ∗
             作者简介: 褚钰 (1995– ), 男, 河北张家口人, 硕士研究生, 研究方向: 机器学习, 语音识别。
              通信作者 E-mail: 18811309895@163.com
             †
   53   54   55   56   57   58   59   60   61   62   63