Page 81 - 《应用声学》2025年第3期
P. 81

第 44 卷 第 3 期                                                                       Vol. 44, No. 3
             2025 年 5 月                          Journal of Applied Acoustics                      May, 2025

             ⋄ 研究论文 ⋄



                          使用自注意力机制及数据增强策略的


                                           乐曲风格识别方法





                                              林 怡    †   徐超兰 龙桂铃


                                                    (宜春学院   宜春   336000)

                摘要:乐曲风格识别是音乐信息检索领域的一个关键分支,现有技术,包括卷积神经网络和 Transformer 模型,
                常面临特征提取不精细、信息融合不足等问题。针对这些问题,该研究设计了一种时域 patch 划分和局部 -全
                局注意力机制。时域 patch 划分方法按照时域方向将整个时间点的频域信息划分为一个 patch 再输入编码器
                中,局部 -全局注意力机制结合了自注意力的全局建模能力和卷积神经网络的局部特征提取能力,能够同时建
                模全局和局部信息。这些方法更能适应声频特征并显著提升了乐曲风格的分类性能。模型在 GTZAN 数据集
                上的准确率达到了 94.80%,同时在 UrbanSound8K 数据集上的准确率为 95.14%,具有较好的鲁棒性,能够适
                用于多种声频分类任务。
                关键词:自注意力机制;乐曲风格识别;数据增强;声频特征提取
                中图法分类号: TP312           文献标识码: A          文章编号: 1000-310X(2025)03-0615-12
                DOI: 10.11684/j.issn.1000-310X.2025.03.010



                       Music style recognition using self-attention mechanism and data
                                                enhancement strategy



                                          LIN Yi, XU Chaolan and LONG Guiling

                                             (Yichun University, Yichun 336000, China)

                 Abstract: Music style recognition is a key branch in the field of music information retrieval. Existing tech-
                 nologies, including convolutional neural network and Transformer model, often face problems such as imprecise
                 feature extraction and insufficient information fusion. To solve these problems, a time-domain patch partition
                 and local-global attention mechanism are designed in this study. According to the time domain direction, the
                 frequency domain information of the entire time point is divided into a patch and then input into the encoder.
                 The local-global attention mechanism combines the global modeling capability of self-attention and the local
                 feature extraction capability of convolutional neural network to model both global and local information at
                 the same time. These methods can better adapt to the audio characteristics and significantly improve the
                 performance of music style classification. The accuracy of the model on GTZAN data set is 94.80%, and the
                 accuracy on UrbanSound8K data set is 95.14%, which has good robustness and is suitable for a variety of audio
                 classification tasks.
                 Keywords: Self-attention mechanism; Music style recognition; Data enhancement; Audio feature extraction

             2024-04-09 收稿; 2024-07-25 定稿
             作者简介: 林怡 (1989– ), 女, 江西分宜人, 硕士, 讲师, 研究方向: 音乐学, 音乐科技。
             † 通信作者 E-mail: Lysally123@163.com
   76   77   78   79   80   81   82   83   84   85   86