Page 238 - 《应用声学》2025年第1期
P. 238
第 44 卷 第 1 期 Vol. 44, No. 1
2025 年 1 月 Journal of Applied Acoustics January, 2025
⋄ 研究论文 ⋄
数据增强和复杂特征优化的类不平衡病理嗓音检测 ∗
武雅琴 1† 张佳庆 1 张 涛 2
(1 山西农业大学软件学院 晋中 030800)
(2 天津大学电气自动化与信息工程学院 天津 300072)
摘要:该文以提高病理嗓音多分类准确性为目标,构建了一种基于数据增强和复杂特征优化的类不平衡病理
嗓音检测系统。首先,对 32 种声学特征进行分析并将其归类为时域类特征和频域类特征;其次,采用改进的合
成少数类过采样技术对数据集进行增广与均衡处理;然后,结合高效相关性特征选择算法和盒图对多维声学
特征进行融合优化,综合评估各特征的判别能力;最后,基于随机森林分类器,详细分析和验证不同特征组合
的分类性能。结果表明,该文提出的融合优化特征集 (To、Fatr、Jita、sAPQ、vAm、NHR) 在随机森林分类器下,
对声带小结、息肉、水肿及麻痹 4 种病理嗓音的分类性能表现优异,取得了 88.6% 的分类准确率、88.4% 的召回
率、88.4% 的 F1 分数和 99.7% 的 AUC 值。
关键词:病理嗓音;数据增强;复杂特征;高效相关性特征选择;盒图
中图法分类号: TN912.3 文献标识码: A 文章编号: 1000-310X(2025)01-0234-11
DOI: 10.11684/j.issn.1000-310X.2025.01.025
Class-imbalanced pathological voice detection with data augmentation and
complex feature optimization
1
WU Yaqin , ZHANG Jiaqing and ZHANG Tao 2
1
(1 Software College, Shanxi Agricultural University, Jinzhong 030800, China)
(2 School of Electrical Automation and Information Engineering, Tianjin University, Tianjin 300072, China)
Abstract: This paper aims to enhance the accuracy of pathological voice classification by developing a class-
imbalanced pathological voice detection system based on the data augmentation and complex feature op-
timization. Firstly, thirty-two speech features are analyzed and grouped into two categories: time-domain
features and frequency-domain features. Secondly, an improved synthetic minority over-sampling technique
is employed to augment and balance the dataset. Next, both the efficient correlation-based feature selection
algorithm and the boxplot method are applied to optimize and integrate multidimensional speech features,
providing a comprehensive evaluation of the discriminative ability of each feature. Finally, the classification
performance of different feature combinations is analyzed and verified in detail using the Random Forest clas-
sifier. Experimental results demonstrate that the optimized feature set (To, Fatr, Jita, sAPQ, vAm, NHR)
exhibits excellent classification performance for four voice disorders, including vocal nodules, polyps, edema,
and paralysis, achieving a classification accuracy of 88.6%, a recall rate of 88.4%, an F1 score of 88.4%, and an
AUC of 99.7%.
Keywords: Pathological voice; Data augmentation; Complex features; Efficient correlation-based feature se-
lection; Box plot
2024-08-14 收稿; 2024-11-19 定稿
国家自然科学基金项目 (6227010455)
∗
作者简介: 武雅琴 (1994– ), 女, 山西太谷人, 硕士, 研究方向: 语声信号处理、深度学习。
† 通信作者 E-mail: wyq0902@sxau.edu.cn