《应用声学》编辑部

文章摘要

林怡,徐超兰,龙桂铃.使用自注意力机制及数据增强策略的乐曲风格识别方法[J].,2025,43(3):615-626

使用自注意力机制及数据增强策略的乐曲风格识别方法

Music style recognition using self-attention mechanism and data enhancement strategy

投稿时间：2024-04-09 修订日期：2025-04-30

中文摘要:

乐曲风格识别是音乐信息检索领域的一个关键分支,现有技术,包括卷积神经网络和Transformer模型,常面临特征提取不精细、信息融合不足等问题。针对这些问题,本研究设计了一种时域patch划分和局部-全局注意力机制。时域patch划分方法按照时域方向将整个时间点的频域信息划分为一个patch再输入编码器中,局部-全局注意力机制结合了自注意力的全局建模能力和卷积神经网络的局部特征提取能力,能够同时建模全局和局部信息。这些方法更能适应音频特征并显著提升了乐曲风格的分类性能。模型在GTZAN数据集上的准确率达到了94.80%,同时在UrbanSound8K数据集上的准确率为95.14%,具有较好的鲁棒性能够适用于多种音频分类任务。

英文摘要:

Music style recognition is a key branch in the field of music information retrieval. Existing technologies, including convolutional neural network and Transformer model, often face problems such as imprecise feature extraction and insufficient information fusion. To solve these problems, a time-domain patch partition and local-global attention mechanism are designed in this study. According to the time domain direction, the frequency domain information of the entire time point is divided into a patch and then input into the encoder. The local-global attention mechanism combines the global modeling capability of self-attention and the local feature extraction capability of convolutional neural network to model both global and local information at the same time. These methods can better adapt to the audio characteristics and significantly improve the performance of music style classification. The accuracy of the model on GTZAN data set is 94.80%, and the accuracy on UrbanSound8K data set is 95.14%, which has good robustness and is suitable for a variety of audio classification tasks.

DOI：10.11684/j.issn.1000-310X.2025.03.010

中文关键词: 自注意力机制乐曲风格识别数据增强音频特征提取

英文关键词: Self-attention mechanism Music style recognition Data enhancement Audio feature extraction

基金项目:

作者	单位	E-mail
林怡^*	宜春学院江西省宜春市	Lysally123@163.com
徐超兰	宜春学院江西省宜春市	Lysally123@163.com
龙桂铃	宜春学院江西省宜春市	Lysally123@163.com

摘要点击次数: 360

全文下载次数: 161

查看全文查看/发表评论下载PDF阅读器

关闭