Page 161 - 《应用声学》2022年第5期
P. 161

第 41 卷 第 5 期                                                                       Vol. 41, No. 5
             2022 年 9 月                          Journal of Applied Acoustics                 September, 2022

             ⋄ 研究报告 ⋄



                               基于深度学习的语声抑郁识别                                            ∗






                                         吴 情 胡维平           †   陈丹丹 肖 婷


                                             (广西师范大学电子工程学院         桂林  541000)

                摘要:世界各地抑郁症患者数量不断增多,抑郁症的诊断和治疗面临着医生短缺问题,针对这一问题,提出了
                卷积神经网络和结合注意力机制的双向长短时记忆特征融合模型。从特征选择和网络构架两方面进行了研
                究,对比了几种经典语声特征,得出梅尔倒谱系数对抑郁分类效果最好,再将梅尔倒谱系数分别送进卷积神经
                网络和结合注意力机制的双向长短时记忆网络实现抑郁分类。在 DAIC-WOZ 数据集上进行实验,所提出的方
                法对语声抑郁的分类精确度达到 78.06%,F1 分数达到 74.68%。
                关键词:抑郁识别;语声分析;分类
                中图法分类号: TN912.3           文献标识码: A          文章编号: 1000-310X(2022)05-0837-06
                DOI: 10.11684/j.issn.1000-310X.2022.05.020






                             Speech depression recognition based on deep learning



                                   WU Qing HU Weiping CHEN Dandan XIAO Ting

                             (College of Electronic Engineering, Guangxi Normal University, Guilin 541000, China)

                 Abstract: The number of depression patients is increasing around the world. There is a shortage of doctors
                 to diagnose and treat depression. In response to this problem, convolutional neural network (CNN) and
                 bidirectional long short-term memory (BLSTM) feature fusion model combined with attention mechanism
                 are proposed. Research has been carried out from the aspects of feature selection and network architecture.
                 By comparing several classical speech features, it is concluded that the Mel-frequency cepstrum coefficient
                 (MFCC) has the best effect on depression classification, and then the Meier cepstrum coefficient is sent into
                 CNN and BLSTM network combined with attention mechanism respectively to achieve depression classification.
                 Experiments on the DAIC-WOZ data set show that the proposed method has a classification accuracy of 78.06%
                 and a F1 score of 74.68%.
                 Keywords: Depression recognition; Speech analysis; Classification










             2021-07-25 收稿; 2021-10-11 定稿
             国家自然科学基金项目 (NSFC 61861005)
             ∗
             作者简介: 吴情 (1996– ), 女, 安徽安庆人, 硕士研究生, 研究方向: 语声信号处理。
             † 通信作者 E-mail: huwp@gxnu.edu.cn
   156   157   158   159   160   161   162   163   164   165   166