Page 166 - 《应用声学》2022年第5期

P. 166

842 2022 年 9 月

的。两个实验也证明了注意力机制的加入，使网络 [5] 陶华伟, 査诚, 梁瑞宇, 等. 面向语音情感识别的语谱图特
的识别效果得到提升。征提取算法 [J]. 东南大学学报 (自然科学版), 2015, 45(5):
817–821.
Tao Huawei, Zha Cheng, Liang Duanyu, et al. Spec-
3 结论
trogram feature extraction algorithm for speech emotion
recognition[J]. Journal of Southeast University(Natural
研究发现，不同的语声特征对于抑郁症的识别 Science Edition), 2015, 45(5): 817–821.
具有不同的效果。本文对几个常用的特征进行了 [6] 杨丹, 姜占才, 余蓥良, 等. 语音信号共振峰提取方法的研究
比较，客观地得出MFCC能较好且稳定地识别是否分析 [J]. 科技信息, 2012(4): 161–162.
[7] 方匡南, 吴见彬, 朱建平, 等. 随机森林方法研究综述 [J]. 统
有抑郁症。本文在结合注意力机制的LSTM模型上
计与信息论坛, 2011, 26(3): 32–38.
进行改进，提出了基于 CNN 和结合注意力机制的 [8] Rejaibi E, Komaty A, Meriaudeau F, et al. MFCC-based
BLSTM 特征融合的语声抑郁识别模型，效果有了 recurrent neural network for automatic clinical depression
recognition and assessment from speech[J]. Biomed Signal
一定的提升。
Process Control, 2022, 71(PA): 103107.
目前语声抑郁识别具有一定的难度，因为涉及 [9] He L, Cao C. Automated depression analysis using con-
患者的隐私，所以对外公开的抑郁语声数据集很少， volutional neural networks from speech[J]. Journal of
Biomedical Informatics, 2018: 83: 103–111.
如何在数据集上进行数据扩充是有必要研究的。而
[10] Sun B, Zhang Y, He J, et al. A random forest regression
且数据集中正负样本的数量相差很大，抑郁患者的 method with selected-text feature for depression assess-
数量远远小于非抑郁患者的数量，如何使数据达到 ment[C]. Audio/Visual Emotion Challenge, 2017: 61–68.
[11] Ma X, Yang H, Chen Q, et al. Depaudionet: an eﬃcient
平衡也是需要探究的。除此之外，人类情感具有模
deep model for audio based depression classiﬁcation[C].
糊的边界，且一句话可能包含多种情感，比如抑郁和 Proceedings of the 6th International Workshop on Au-
伤心的大多数语声特征是相似的，这就会造成识别 dio/Visual Emotion Challenge, 2016: 35–42.
[12] 刘振宇. 基于语音的抑郁识别方法及关键技术研究 [D]. 兰州:
混淆，所以如何实现长时语声的复杂情感识别，也是
兰州大学, 2017.
未来的研究方向。 [13] 刘美. 基于语音信号的抑郁症识别研究与应用 [D]. 天津: 天
抑郁症检测是一个较为复杂的研究课题，单纯津师范大学, 2018.
语声参数不足以反映抑郁症患者的特点，在未来的 [14] 朱张莉, 饶元, 吴渊, 等. 注意力机制在深度学习中的研究进
展 [J]. 中文信息学报, 2019, 33(6): 1–11.
研究中，可参考医生的经验，结合表情、眼神等图像 Zhu Zhangli, Rao Yuan, Wu Yuan, et al. Research
特征，尝试用多模态方法来提高检测正确率。 progress of attention mechanism in deep learning[J]. Jour-
nal of Chinese Information Processing, 2019, 33(6): 1–11.
[15] 翟社平, 杨媛媛, 邱程, 等. 基于注意力机制 Bi-LSTM 算法
参考文献的双语文本情感分析 [J]. 计算机应用与软件, 2019, 36(12):
251–255.
[1] Angeleri F, Angeleri V A, Foschi N, et al. The inﬂuence of Zhai Sheping, Yang Yuanyuan, Qiu Cheng, et al. Bilin-
depression, social activity, and family stress on functional gual text sentiment analysis based on attention mecha-
outcome after stroke[J]. Stroke, 1993, 24(10): 1478–1483. nism Bi-LSTM[J]. Computer Applications and Software,
[2] 褚钰, 李田港, 叶硕, 等. 语音情感识别中的特征选择方法 [J]. 2019, 36(12): 251–255.
应用声学, 2020, 39(2): 216–22 [16] Bailey A, Plumbley M D. Raw audio for depression de-
Chu Yu, Li Tiangang, Ye Shuo, et al. Research on fea- tection can be more robust against gender imbalance
ture selection method in speech emotion recognition[J]. than Mel-spectrogram features[J]. arXiv Preprint, arXiv:
Journal of Applied Acoustics, 2020, 39(2): 216–22. 2010.15120, 2020.
[3] 李小宇. 基于语音切片的抑郁识别研究 [D]. 兰州: 兰州大学, [17] Kroenke K, Strine T W, Spitzer R L, et al. The PHQ-8
2018. as a measure of current depression in the general popu-
[4] 张少康, 田德艳. 水下声目标的梅尔倒谱系数智能分类方 lation[J]. Journal of Aﬀective Disorders, 2009, 114(1–3):
法 [J]. 应用声学, 2019, 38(2): 267–272. 163–173.
Zhang Shaokang, Tian Deyan. Intelligent classiﬁcation [18] Sun L, Fu S, Wang F. Decision tree SVM model with
method of Mel frequency cepstrum coeﬃcient for under- Fisher feature selection for speech emotion recognition[J].
water acoustic targets[J]. Journal of Applied Acoustics, EURASIP Journal on Audio, Speech, and Music Process-
2019, 38(2): 267–272. ing, 2019, 2019(1): 1–14.

161 162 163 164 165 166 167 168 169 170 171