Page 60 - 《应用声学》2025年第3期
P. 60

594                                                                                  2025 年 5 月


             2.5 本文方法与其他方法的性能对比                                    Li Wei, Li Shuo. Understanding digital audio——A re-
                 表 5 是本文方法与一些经典方法的对比,包                             view of general audio/ambient sound based computer au-
                                                                   dition[J]. Journal of Fudan University (Natural Science),
             括基于 VGG     [23] 、CRNN [24] 、ResNet [16]  和 Trans-    2019, 58(3): 269–313.
             former [25]  典型神经网络结构的分类方法。相较                       [2] 陈爱武. 家居音频场景识别关键技术研究 [D]. 广州: 华南理
             于这些经典模型,DADHPN 在分类准确率上表现                              工大学, 2020: 2.
                                                                 [3] Chu S, Narayanan S, Kuo C C J. Environmental sound
             出色,达到了89.5%,而应用 EIL 后的分类准确率为
                                                                   recognition with time–frequency audio features[J]. IEEE
             90.1%。DADHPN 结合了注意力机制、双层级并行                           Transactions on Audio, Speech, and Language Processing,
             分类以及教师 -学生模型的知识传递策略,充分利                               2009, 17(6): 1142–1158.
             用了高层级场景特征知识的辅助,动态调整低层级                              [4] Phaye S S R, Benetos E, Wang Y. Subspectralnet–using
                                                                   sub-spectrogram based convolutional neural networks for
             场景通道特征权重,提高了低层级场景的分类性能。
                                                                   acoustic scene classification[C]// 2019 IEEE International
             同时,EIL 结合了高层级场景预测和低层级场景预                              Conference on Acoustics, Speech and Signal Processing
             测结果,增强了网络的低层级场景辨识能力,进一步                               (ICASSP). IEEE, 2019: 825–829.
                                                                 [5] McDonnell M D, Gao W. Acoustic scene classification us-
             提升了分类准确率。
                                                                   ing deep residual networks with late fusion of separated
                         表 5   模型对比分类指标                            high and low frequency paths[C]// 2020 IEEE Interna-
                                                                   tional Conference on Acoustics, Speech and Signal Pro-
                Table 5 Model comparison classification
                                                                   cessing (ICASSP). IEEE, 2020: 141–145.
                indicators
                                                                 [6] Chen H, Liu Z, Liu Z, et al. Integrating the data augmen-
                                                                   tation scheme with various classifiers for acoustic scene
                          模型                  准确率/%
                                                                   modeling[J]. arXiv Preprint, arXiv: 1907.06639, 2019.
                     VGG-like CNN [23]          81.6             [7] Singh A, Rajan P, Bhavsar A. Deep multi-view features
                      CNN-GRU [24]              88.1               from raw audio for acoustic scene classification[C]// IEEE
                        ResNet [16]             88.4               AASP Challenge on Detection and Classification of Acous-
                                                                   tic Scenes and Events (DCASE), 2019: 229–233.
                  Transformer Encoder [25]      88.9
                                                                 [8] Kim T, Lee J, Nam J. Comparison and analysis of Sam-
                        DADHPN                  89.5               pleCNN architectures for audio classification[J]. IEEE
                     DADHPN + EIL               90.1               Journal of Selected Topics in Signal Processing, 2019,
                                                                   13(2): 285–297.
             3 结论                                                [9] Kong Q, Cao Y, Iqbal T, et al. Panns: Large-scale pre-
                                                                   trained audio neural networks for audio pattern recogni-
                                                                   tion[J]. IEEE/ACM Transactions on Audio, Speech, and
                 本文提出了一种基于注意力的双层级并行网
                                                                   Language Processing, 2020, 28: 2880–2894.
             络(DADHPN)用于 ASC,其综合了注意力机制、双                        [10] Nigro M, Rueda A, Krishnan S. Acoustic scene classifi-
             层级并行分类和教师 -学生模型的知识传递策略。                               cation using time–frequency energy emphasis and convo-
                                                                   lutional recurrent neural networks[C]// Artificial Intelli-
             DADHPN 能够动态调整学生模型的学习焦点,利
                                                                   gence and Evolutionary Computations in Engineering Sys-
             用教师模型知识实现对学生模型特征的选择性关                                 tems: Computational Algorithm for AI Technology, Pro-
             注,提高了学生模型的场景分类能力,使其分类准                                ceedings of ICAIECES 2020. Springer Singapore, 2022:
             确率达到了89.5%。此外,通过EIL融合高低层级分                            267–276.
                                                                [11] Hou Y, Kang B, Van Hauwermeiren W, et al. Relation-
             类模型的输出,弥补了学生模型单一预测的局限性,
                                                                   guided acoustic scene classification aided with event em-
             提高了学生模型在低层级场景分类方面的能力。最                                beddings[C]// 2022 International Joint Conference on
             终,获得了 90.1% 的分类准确率,超越了一些典型                            Neural Networks (IJCNN). IEEE, 2022: 1–8.
             ASC网络的性能,验证了本文方法的有效性。                              [12] Huang J, Lu H, Lopez Meyer P, et al. Acoustic scene
                                                                   classification using deep learning-based ensemble averag-
                                                                   ing[C]// IEEE AASP Challenge on Detection and Clas-
                                                                   sification of Acoustic Scenes and Events (DCASE), 2019:
                            参 考     文   献                          94–98.
                                                                [13] Ren Z, Kong Q, Han J, et al. Attention-based atrous
              [1] 李伟, 李硕. 理解数字声音 ——基于一般音频/环境声的                     convolutional neural networks: Visualisation and under-
                 计算机听觉综述 [J]. 复旦学报 (自然科学版), 2019, 58(3):           standing perspectives of acoustic scenes[C]// 2019 IEEE
                 269–313.                                          International Conference on Acoustics, Speech and Signal
   55   56   57   58   59   60   61   62   63   64   65