文章摘要
梁腾,姜文宗,王立,刘宝弟,王延江.神经网络的声场景自动分类方法*[J].,2022,41(3):373-380
神经网络的声场景自动分类方法*
Automatic classification of acoustic scene based on neural network
投稿时间:2021-04-12  修订日期:2022-05-05
中文摘要:
      声音是人类感知外部世界的要素之一,与人的日常工作和生活密切相关。声学场景探察和自动分类帮助人类制定应对特定环境的正确策略,具有重要的研究价值。随着卷积神经网络(CNN)的发展,出现了许多基于CNN的声学场景分类方法。其中时频卷积神经网络(TS-CNN)采用了时频注意力模块,是目前声学场景分类效果最好的网络之一,但是由于其结构复杂运算效率较低,难以达到最好的分类性能。为此,本文在TS-CNN基础上提出了一种基于协同学习的卷积神经网络模型,以提高声学场景分类模型的运算效率。它能够在一个网络上同时训练多个分类器头,而不需要额外的测试成本。此外,TSCNN-CL能够减少梯度爆炸和梯度消失的发生。在ESC-10、ESC-50和UrbanSound8k数据集的综合实验表明,该模型分类效果要优于TS-CNN模型以及大部分当前主流方法。
英文摘要:
      Sound is one of the elements of human perception of the external world, which is closely related to human daily work and life. Acoustic scene detection and automatic classification help human beings to formulate correct strategies to deal with specific environments, which has important research value. With the development of convolutional neural networks (CNN), many CNN-based acoustic scene classification methods have emerged. Among them, the Temporal-Spectral CNN (TS-CNN) adopts Temporal-Spectral attention module, which is one of the best methods for classification of acoustic scenes at present, but because of its complex structure and low operation efficiency, it is difficult to achieve the best classification performance. To this end, this paper proposes a convolutional neural network model (TSCNN-CL) based on cooperative learning to improve the computational efficiency of the acoustic scene classification model, which can train multiple classifier heads simultaneously on one network without additional test costs. Furthermore, TSCNN-CL can reduce the occurrence of gradient explosion and gradient disappearance. Comprehensive experiments on ESC-10, ESC-50, and UrbanSound8k datasets show that the classification performance of TSCNN-CL model outperforms the TS-CNN model and has compelling advantages in comparison with some other state-of-art models.
DOI:10.11684/j.issn.1000-310X.2022.03.006
中文关键词: 环境声音分类,时频卷积神经网络,协同学习,声音信号处理
英文关键词: environmental sound classification  time-spatial convolutional neural network  collaborative learning  sound signal processing
基金项目:国家自然科学基金项目(面上项目,重点项目,重大项目)
作者单位E-mail
梁腾 中国石油大学(华东)海洋与空间信息学院 liangteng1996@foxmail.com 
姜文宗 中国石油大学(华东) jiangwenzong1@gmail.com 
王立 中国石油大学(华东) li.wang.upc@foxmail.com 
刘宝弟 中国石油大学(华东) thu. liubaodi@gmail.com 
王延江* 中国石油大学(华东) yjwang@upc.edu.cn 
摘要点击次数: 837
全文下载次数: 999
查看全文   查看/发表评论  下载PDF阅读器
关闭