Page 60 - 《应用声学》2025年第3期

P. 60

594 2025 年 5 月

2.5 本文方法与其他方法的性能对比 Li Wei, Li Shuo. Understanding digital audio——A re-
表 5 是本文方法与一些经典方法的对比，包 view of general audio/ambient sound based computer au-
dition[J]. Journal of Fudan University (Natural Science),
括基于 VGG [23] 、CRNN [24] 、ResNet [16] 和 Trans- 2019, 58(3): 269–313.
former [25] 典型神经网络结构的分类方法。相较 [2] 陈爱武. 家居音频场景识别关键技术研究 [D]. 广州: 华南理
于这些经典模型，DADHPN 在分类准确率上表现工大学, 2020: 2.
[3] Chu S, Narayanan S, Kuo C C J. Environmental sound
出色，达到了89.5%，而应用 EIL 后的分类准确率为
recognition with time–frequency audio features[J]. IEEE
90.1%。DADHPN 结合了注意力机制、双层级并行 Transactions on Audio, Speech, and Language Processing,
分类以及教师 -学生模型的知识传递策略，充分利 2009, 17(6): 1142–1158.
用了高层级场景特征知识的辅助，动态调整低层级 [4] Phaye S S R, Benetos E, Wang Y. Subspectralnet–using
sub-spectrogram based convolutional neural networks for
场景通道特征权重，提高了低层级场景的分类性能。
acoustic scene classiﬁcation[C]// 2019 IEEE International
同时，EIL 结合了高层级场景预测和低层级场景预 Conference on Acoustics, Speech and Signal Processing
测结果，增强了网络的低层级场景辨识能力，进一步 (ICASSP). IEEE, 2019: 825–829.
[5] McDonnell M D, Gao W. Acoustic scene classiﬁcation us-
提升了分类准确率。
ing deep residual networks with late fusion of separated
表 5 模型对比分类指标 high and low frequency paths[C]// 2020 IEEE Interna-
tional Conference on Acoustics, Speech and Signal Pro-
Table 5 Model comparison classiﬁcation
cessing (ICASSP). IEEE, 2020: 141–145.
indicators
[6] Chen H, Liu Z, Liu Z, et al. Integrating the data augmen-
tation scheme with various classiﬁers for acoustic scene
模型准确率/%
modeling[J]. arXiv Preprint, arXiv: 1907.06639, 2019.
VGG-like CNN [23] 81.6 [7] Singh A, Rajan P, Bhavsar A. Deep multi-view features
CNN-GRU [24] 88.1 from raw audio for acoustic scene classiﬁcation[C]// IEEE
ResNet [16] 88.4 AASP Challenge on Detection and Classiﬁcation of Acous-
tic Scenes and Events (DCASE), 2019: 229–233.
Transformer Encoder [25] 88.9
[8] Kim T, Lee J, Nam J. Comparison and analysis of Sam-
DADHPN 89.5 pleCNN architectures for audio classiﬁcation[J]. IEEE
DADHPN + EIL 90.1 Journal of Selected Topics in Signal Processing, 2019,
13(2): 285–297.
3 结论 [9] Kong Q, Cao Y, Iqbal T, et al. Panns: Large-scale pre-
trained audio neural networks for audio pattern recogni-
tion[J]. IEEE/ACM Transactions on Audio, Speech, and
本文提出了一种基于注意力的双层级并行网
Language Processing, 2020, 28: 2880–2894.
络(DADHPN)用于 ASC，其综合了注意力机制、双 [10] Nigro M, Rueda A, Krishnan S. Acoustic scene classiﬁ-
层级并行分类和教师 -学生模型的知识传递策略。 cation using time–frequency energy emphasis and convo-
lutional recurrent neural networks[C]// Artiﬁcial Intelli-
DADHPN 能够动态调整学生模型的学习焦点，利
gence and Evolutionary Computations in Engineering Sys-
用教师模型知识实现对学生模型特征的选择性关 tems: Computational Algorithm for AI Technology, Pro-
注，提高了学生模型的场景分类能力，使其分类准 ceedings of ICAIECES 2020. Springer Singapore, 2022:
确率达到了89.5%。此外，通过EIL融合高低层级分 267–276.
[11] Hou Y, Kang B, Van Hauwermeiren W, et al. Relation-
类模型的输出，弥补了学生模型单一预测的局限性，
guided acoustic scene classiﬁcation aided with event em-
提高了学生模型在低层级场景分类方面的能力。最 beddings[C]// 2022 International Joint Conference on
终，获得了 90.1% 的分类准确率，超越了一些典型 Neural Networks (IJCNN). IEEE, 2022: 1–8.
ASC网络的性能，验证了本文方法的有效性。 [12] Huang J, Lu H, Lopez Meyer P, et al. Acoustic scene
classiﬁcation using deep learning-based ensemble averag-
ing[C]// IEEE AASP Challenge on Detection and Clas-
siﬁcation of Acoustic Scenes and Events (DCASE), 2019:
参考文献 94–98.
[13] Ren Z, Kong Q, Han J, et al. Attention-based atrous
[1] 李伟, 李硕. 理解数字声音 ——基于一般音频/环境声的 convolutional neural networks: Visualisation and under-
计算机听觉综述 [J]. 复旦学报 (自然科学版), 2019, 58(3): standing perspectives of acoustic scenes[C]// 2019 IEEE
269–313. International Conference on Acoustics, Speech and Signal

55 56 57 58 59 60 61 62 63 64 65