文章摘要
和椿皓,常铁原,潘立冬.用于说话人识别的密集多分支时延神经网络[J].,2024,43(5):949-955
用于说话人识别的密集多分支时延神经网络
Dense multi-branch time delay neural network for speaker recognition
投稿时间:2023-05-15  修订日期:2024-09-04
中文摘要:
      时延神经网络是较早应用于说话人识别领域的一类神经网络。为实现更好的识别性能,近年来一些改进工作围绕加深或拓宽其网络结构进行。在对密集连接卷积网络以及多分支网络结构进行研究的基础上,提出一种密集多分支时延神经网络,用以进一步提升小体积模型对说话人特征的提取能力。在使用密集连接实现特征重用的基础上,并行多分支结构能同时对同一输入在不同分辨率下进行特征提取。在VoxCeleb1测试集、VoxCeleb1-H、VoxCeleb1-E上进行测试表明,该网络能在模型参数量较小的前提下实现准确的说话人识别,以便应用在一些存储空间受限的本地说话人识别场景中。
英文摘要:
      Time delay neural networks are a class of neural networks that have been applied in the field of speaker recognition for a long time. To achieve better recognition performance, some improvement works in recent years revolve around deepening or widening their network structures. Based on the study of densely connected convolutional networks and multi-branch network structures, a dense multi-branch time delay neural network is proposed to further improve the speaker feature extraction capability of small volume models. On the basis of feature reuse using dense connections, the parallel multi-branch structure enables simultaneous feature extraction on the same input at different resolutions. Tests on the VoxCeleb1 test set, VoxCeleb1-H, and VoxCeleb1-E show that the network can achieve accurate speaker recognition with a small number of model parameters for application in some local speaker recognition scenarios where storage space is limited.
DOI:10.11684/j.issn.1000-310X.2024.05.003
中文关键词: 说话人识别  时延神经网络  多分支神经网络  密集连接  深度学习
英文关键词: Speaker recognition  Time delay neural networks  Multi-branch neural networks  Dense connectivity  Deep learning
基金项目:
作者单位E-mail
和椿皓 河北大学电子信息工程学院 hchofficial@outlook.com 
常铁原 河北大学电子信息工程学院 tieyuan_chang@hbu.edu.cn 
潘立冬* 河北大学电子信息工程学院 sjqtzbd@163.com 
摘要点击次数: 256
全文下载次数: 471
查看全文   查看/发表评论  下载PDF阅读器
关闭