文章摘要
杨俊杰,丁家辉,杨柳,冯丽,杨超.结合MGCC特征与多尺度通道注意力的环境声深度学习分类方法*[J].,2024,43(3):513-524
结合MGCC特征与多尺度通道注意力的环境声深度学习分类方法*
Environmental sound classification method using MGCC feature and multi-scale channel attention based deep neural network collaboration
投稿时间:2023-11-30  修订日期:2024-04-30
中文摘要:
      环境声分类技术在家居安全监测、人机语音交互等领域具有关键作用。然而,声源的多样性与混合性给环境声分类方法设计带来了重大挑战。为提高分类准确率与节约计算资源,本文提出一种基于多尺度通道注意力机制下的深度学习分类模型。所提模型由特征提取模块、多尺度卷积模块、高效通道注意力模块、输出层四部分组成。首先,通过引入加权型梅尔Gammatone频率倒谱系数挖掘环境声频谱幅值与相位结构信息;其次,融合多尺度卷积核与高效通道注意力机制优选出音频关键局部细节和通道特征;最后,在全连接层采用softmax函数映射特征并输出环境声类型的概率值。所提模型在6种环境声的iFLYTEK、10种环境声的Urbansound8k数据集上开展测试验证,分别取得了94%、76.52%、79.24%(iFLYTEK+Urbansound8k)的分类准确率。消融实验结果进一步表明:引入的多尺度卷积模块、通道注意力机制模块对分类准确率的提升贡献率分别接近于3.77%和1.89%。实验还详细对比了7种现有的深度学习分类方法,所提算法在分类准确率上排名第二;另外, 在同级别算法中如ResNet18、GoogLeNet,所提算法在模型参数量和计算复杂度方面上实现了进一步的约减。
英文摘要:
      Environmental sound classification (ESC) plays an important role in varies areas such as home security monitoring and human-machine voice interaction etc. However, the diversity and complexity of sound sources pose significant challenges to the design of ESC methods. In order to enhance classification accuracy and conserve computational resources, an advanced deep classification approach based on convolutional neural networks (CNN), collaborated by a multi-scale channel attention mechanism was established in this paper. The framework of this model is divided into four key segments: a feature extraction module, a multi-scale convolution network module, an efficient attention module, and an output layer for final classification. First, it incorporates a weighted mel-generalized cepstral coefficients (MGCC) feature, designed to extract both frequency and phase structure information of environmental sound. Second, this model cooperates the multi-scale kernel convolution and efficient channel attention mechanism to abstract and selectively focus to specific local structure and channel of environmental sounds. Finally, the softmax function is used in the fully connected layer to map features and output the probability of environmental sound types. Experimental results on public datasets of iFLYTEK and Urbansound8k demonstrated that the proposed model have achieved ESC accuracy of 94%,76.52%, 79.24%(iFLYTEK +Urbansound8k), respectively. Further ablation experiments indicate that the introduced multi-scale convolution module and channel attention mechanism module contribute to an improvement in classification accuracy by approximately 3.77% and 1.89%, respectively. The experiments also provide comparison with the state-of-the-art deep learning classification methods, ranking the proposed algorithm second in terms of classification accuracy. Additionally, comparing to the best methods such as ResNet18 and GoogLeNet, the proposed algorithm achieves further reduction in model parameters and computational complexity.
DOI:10.11684/j.issn.1000-310X.2024.03.006
中文关键词: 环境声分类  梅尔Gammatone频率倒谱  多尺度核卷积  高效通道注意力  卷积神经网络
英文关键词: Environmental sound classification  mel-generalized cepstral coefficients  multi-scale kernel convolution  
基金项目:国家自然科学青年基金项目(62003101), 广东省自然科学基金面上基金(2022A1515010181,2023A1515011290)
作者单位E-mail
杨俊杰 广东工业大学 junjieyang@gdut.edu.cn 
丁家辉 广东工业大学 547677887@qq.com 
杨柳 广州大学 willow_gao@126.com 
冯丽 澳门科技大学 lfeng@must.edu.mo 
杨超* 广东工业大学 yangchaoscut@aliyun.com 
摘要点击次数: 267
全文下载次数: 253
查看全文   查看/发表评论  下载PDF阅读器
关闭