Page 81 - 《应用声学》2025年第3期

P. 81

第 44 卷第 3 期 Vol. 44, No. 3
2025 年 5 月 Journal of Applied Acoustics May, 2025

⋄ 研究论文 ⋄

使用自注意力机制及数据增强策略的

乐曲风格识别方法

林怡 † 徐超兰龙桂铃

(宜春学院宜春 336000)

摘要：乐曲风格识别是音乐信息检索领域的一个关键分支，现有技术，包括卷积神经网络和 Transformer 模型，
常面临特征提取不精细、信息融合不足等问题。针对这些问题，该研究设计了一种时域 patch 划分和局部 -全
局注意力机制。时域 patch 划分方法按照时域方向将整个时间点的频域信息划分为一个 patch 再输入编码器
中，局部 -全局注意力机制结合了自注意力的全局建模能力和卷积神经网络的局部特征提取能力，能够同时建
模全局和局部信息。这些方法更能适应声频特征并显著提升了乐曲风格的分类性能。模型在 GTZAN 数据集
上的准确率达到了 94.80%，同时在 UrbanSound8K 数据集上的准确率为 95.14%，具有较好的鲁棒性，能够适
用于多种声频分类任务。
关键词：自注意力机制；乐曲风格识别；数据增强；声频特征提取
中图法分类号: TP312 文献标识码: A 文章编号: 1000-310X(2025)03-0615-12
DOI: 10.11684/j.issn.1000-310X.2025.03.010

Music style recognition using self-attention mechanism and data
enhancement strategy

LIN Yi, XU Chaolan and LONG Guiling

(Yichun University, Yichun 336000, China)

Abstract: Music style recognition is a key branch in the ﬁeld of music information retrieval. Existing tech-
nologies, including convolutional neural network and Transformer model, often face problems such as imprecise
feature extraction and insuﬃcient information fusion. To solve these problems, a time-domain patch partition
and local-global attention mechanism are designed in this study. According to the time domain direction, the
frequency domain information of the entire time point is divided into a patch and then input into the encoder.
The local-global attention mechanism combines the global modeling capability of self-attention and the local
feature extraction capability of convolutional neural network to model both global and local information at
the same time. These methods can better adapt to the audio characteristics and signiﬁcantly improve the
performance of music style classiﬁcation. The accuracy of the model on GTZAN data set is 94.80%, and the
accuracy on UrbanSound8K data set is 95.14%, which has good robustness and is suitable for a variety of audio
classiﬁcation tasks.
Keywords: Self-attention mechanism; Music style recognition; Data enhancement; Audio feature extraction

2024-04-09 收稿; 2024-07-25 定稿
作者简介: 林怡 (1989– ), 女, 江西分宜人, 硕士, 讲师, 研究方向: 音乐学, 音乐科技。
† 通信作者 E-mail: Lysally123@163.com

76 77 78 79 80 81 82 83 84 85 86