《应用声学》编辑部

文章摘要

和椿皓,常铁原,潘立冬.用于说话人识别的密集多分支时延神经网络[J].,2024,43(5):949-955

用于说话人识别的密集多分支时延神经网络

Dense multi-branch time delay neural network for speaker recognition

投稿时间：2023-05-15 修订日期：2024-09-04

中文摘要:

时延神经网络是较早应用于说话人识别领域的一类神经网络。为实现更好的识别性能，近年来一些改进工作围绕加深或拓宽其网络结构进行。在对密集连接卷积网络以及多分支网络结构进行研究的基础上，提出一种密集多分支时延神经网络，用以进一步提升小体积模型对说话人特征的提取能力。在使用密集连接实现特征重用的基础上，并行多分支结构能同时对同一输入在不同分辨率下进行特征提取。在VoxCeleb1测试集、VoxCeleb1-H、VoxCeleb1-E上进行测试表明，该网络能在模型参数量较小的前提下实现准确的说话人识别，以便应用在一些存储空间受限的本地说话人识别场景中。

英文摘要:

Time delay neural networks are a class of neural networks that have been applied in the field of speaker recognition for a long time. To achieve better recognition performance, some improvement works in recent years revolve around deepening or widening their network structures. Based on the study of densely connected convolutional networks and multi-branch network structures, a dense multi-branch time delay neural network is proposed to further improve the speaker feature extraction capability of small volume models. On the basis of feature reuse using dense connections, the parallel multi-branch structure enables simultaneous feature extraction on the same input at different resolutions. Tests on the VoxCeleb1 test set, VoxCeleb1-H, and VoxCeleb1-E show that the network can achieve accurate speaker recognition with a small number of model parameters for application in some local speaker recognition scenarios where storage space is limited.

DOI：10.11684/j.issn.1000-310X.2024.05.003

中文关键词: 说话人识别时延神经网络多分支神经网络密集连接深度学习

英文关键词: Speaker recognition Time delay neural networks Multi-branch neural networks Dense connectivity Deep learning

基金项目:

作者	单位	E-mail
和椿皓	河北大学电子信息工程学院	hchofficial@outlook.com
常铁原	河北大学电子信息工程学院	tieyuan_chang@hbu.edu.cn
潘立冬^*	河北大学电子信息工程学院	sjqtzbd@163.com

摘要点击次数: 353

全文下载次数: 684

查看全文查看/发表评论下载PDF阅读器

关闭