Page 238 - 《应用声学》2025年第1期
P. 238

第 44 卷 第 1 期                                                                       Vol. 44, No. 1
             2025 年 1 月                          Journal of Applied Acoustics                   January, 2025

             ⋄ 研究论文 ⋄



              数据增强和复杂特征优化的类不平衡病理嗓音检测                                                                     ∗




                                            武雅琴      1†  张佳庆      1   张 涛    2

                                              (1 山西农业大学软件学院       晋中   030800)
                                         (2 天津大学电气自动化与信息工程学院           天津   300072)

                摘要:该文以提高病理嗓音多分类准确性为目标,构建了一种基于数据增强和复杂特征优化的类不平衡病理
                嗓音检测系统。首先,对 32 种声学特征进行分析并将其归类为时域类特征和频域类特征;其次,采用改进的合
                成少数类过采样技术对数据集进行增广与均衡处理;然后,结合高效相关性特征选择算法和盒图对多维声学
                特征进行融合优化,综合评估各特征的判别能力;最后,基于随机森林分类器,详细分析和验证不同特征组合
                的分类性能。结果表明,该文提出的融合优化特征集 (To、Fatr、Jita、sAPQ、vAm、NHR) 在随机森林分类器下,
                对声带小结、息肉、水肿及麻痹 4 种病理嗓音的分类性能表现优异,取得了 88.6% 的分类准确率、88.4% 的召回
                率、88.4% 的 F1 分数和 99.7% 的 AUC 值。
                关键词:病理嗓音;数据增强;复杂特征;高效相关性特征选择;盒图
                中图法分类号: TN912.3           文献标识码: A          文章编号: 1000-310X(2025)01-0234-11
                DOI: 10.11684/j.issn.1000-310X.2025.01.025


                 Class-imbalanced pathological voice detection with data augmentation and
                                           complex feature optimization



                                                                 1
                                      WU Yaqin , ZHANG Jiaqing and ZHANG Tao      2
                                                1
                                (1 Software College, Shanxi Agricultural University, Jinzhong 030800, China)
                    (2 School of Electrical Automation and Information Engineering, Tianjin University, Tianjin 300072, China)

                 Abstract: This paper aims to enhance the accuracy of pathological voice classification by developing a class-
                 imbalanced pathological voice detection system based on the data augmentation and complex feature op-
                 timization. Firstly, thirty-two speech features are analyzed and grouped into two categories: time-domain
                 features and frequency-domain features. Secondly, an improved synthetic minority over-sampling technique
                 is employed to augment and balance the dataset. Next, both the efficient correlation-based feature selection
                 algorithm and the boxplot method are applied to optimize and integrate multidimensional speech features,
                 providing a comprehensive evaluation of the discriminative ability of each feature. Finally, the classification
                 performance of different feature combinations is analyzed and verified in detail using the Random Forest clas-
                 sifier. Experimental results demonstrate that the optimized feature set (To, Fatr, Jita, sAPQ, vAm, NHR)
                 exhibits excellent classification performance for four voice disorders, including vocal nodules, polyps, edema,
                 and paralysis, achieving a classification accuracy of 88.6%, a recall rate of 88.4%, an F1 score of 88.4%, and an
                 AUC of 99.7%.
                 Keywords: Pathological voice; Data augmentation; Complex features; Efficient correlation-based feature se-
                 lection; Box plot
             2024-08-14 收稿; 2024-11-19 定稿
             国家自然科学基金项目 (6227010455)
             ∗
             作者简介: 武雅琴 (1994– ), 女, 山西太谷人, 硕士, 研究方向: 语声信号处理、深度学习。
             † 通信作者 E-mail: wyq0902@sxau.edu.cn
   233   234   235   236   237   238   239   240   241   242   243