《应用声学》编辑部

文章摘要

王基豪,周晓彦,韩智超,王丽丽.基于桥接Transformer的小样本优化鸟声识别网络[J].,2024,43(3):542-551

基于桥接Transformer的小样本优化鸟声识别网络

Small sample optimized bird sound recognition network based on bridging transformer

投稿时间：2023-01-05 修订日期：2024-04-27

中文摘要:

针对实际鸟类监测环境中，收集鸟声声频数据分布不均匀，导致神经网络训练不充分，分类识别测试准确率低的问题，设计了一种桥接Transformer神经网络模型。该网络首先利用原始鸟声声频信号生成短时傅里叶变换语谱图作为输入特征，之后将语谱图输入到由注意力模块和卷积模块桥接组成的Transformer网络中，完成对语谱图中全局特征和局部特征的信息交互，最后利用单层Transformer编码器实现对每一个批次样本的损失优化，得到最终的分类结果。在Birdsdata和xeno-canto鸟声数据集上进行小样本实验，分别获得了91.34%和82.63%的平均准确率，与其他鸟声识别网络进行了对比实验，验证了该网络的有效性。

英文摘要:

In view of the uneven distribution of bird sound audio data collected by actual bird monitoring, the neural network training is not sufficient, and the classification recognition test accuracy is low, a bridging Transformer neural network model is designed. The network first uses the original birdsong audio signal to generate the short-time Fourier transform spectrogram as the input feature, and then inputs the spectrogram into the Transformer network composed of the attention module and the convolution module to complete the information interaction of the global and local features in the spectrogram. Finally, the single-layer Transformer encoder is used to optimize the loss in each batch of samples to obtain the final classification result. Small sample experiments were carried out on Birdsdata and xeno-canto bird sound datasets, and the average accuracy rates of 91.34% and 82.63% were obtained, respectively. Comparative experiments were carried out with other bird sound recognition networks to verify the effectiveness of the network.

DOI：10.11684/j.issn.1000-310X.2024.03.009

中文关键词: 鸟声识别注意力机制卷积模块 Transformer网络

英文关键词: bird sound recognition attention mechanism convolution module Transformer network

基金项目:

作者	单位	E-mail
王基豪	南京信息工程大学	1090947435@qq.com
周晓彦^*	南京信息工程大学电子与信息工程学院	xiaoyan_zhou@nuist.edu.cn
韩智超	南京信息工程大学电子与信息工程学院	xiaoyan_zhou@nuist.edu.cn
王丽丽	南京信息工程大学电子与信息工程学院	xiaoyan_zhou@nuist.edu.cn

摘要点击次数: 613

全文下载次数: 494

查看全文查看/发表评论下载PDF阅读器

关闭