文章摘要
涂振华,赵腊生,毛嘉莹.通道残差融合和时频注意力的说话人验证[J].,2025,44(5):1232-1241
通道残差融合和时频注意力的说话人验证
Speaker verification based on channel residual fusion and time-frequency attention
投稿时间:2024-05-23  修订日期:2025-08-30
中文摘要:
      近些年基于深度神经网络的说话人验证模型取得了显著的进展。然而,先前工作在融合频域局部特征方面仍存在局限,未能充分利用特征间的互补性,并且缺乏针对长时上下文的高效建模方法。为了解决上述的问题,提出了一种基于通道残差融合和时频注意力的说话人验证模型,采用注意力融合机制自动调整不同通道间局部特征的融合权值,增强模型对局部特征的表达能力。并且提出了一种时频混合通道注意力机制,对更远距离的帧间关系进行建模,提升了模型对长时上下文信息的捕获能力。通过在CN-Celeb数据集上进行的实验结果表明,在等错误率和最小检验代价两个指标上均优于对比模型,证明了该模型在不同说话人语境中的有效性。
英文摘要:
      Speaker verification models based on deep neural networks have made significant progress in recent years. However, previous works still have limitations in fusing local features in the frequency domain, failing to fully exploit the complementarity between features, and lacking efficient modeling methods for long-term context. In order to solve the above problems, a speaker verification model based on channel residual fusion and time-frequency attention is proposed. The attention fusion mechanism is used to automatically adjust the fusion weights of local features between different channels and enhance the model""s expression ability of local features. And a time-frequency mixed channel attention mechanism is proposed to model longer-distance inter-frame relationships, improving the model""s ability to capture long-term contextual information. Experimental results on the CN-Celeb datasets show that it is better than the comparison model in both indicators of equal error rate and minimum detection cost function, proving the effectiveness of the model in different speaker contexts.
DOI:10.11684/j.issn.1000-310X.2025.05.013
中文关键词: 说话人验证  神经网络  通道残差融合  通道注意力
英文关键词: Speaker verification  Neural network  Channel residual fusion  Channel attention
基金项目:辽宁省教育厅基本科研项目;111计划项目;大连市科技创新基金计划项目
作者单位E-mail
涂振华 大连大学先进设计与智能计算省部共建教育部重点实验室 tuzhenhua@s.dlu.edu.cn 
赵腊生* 大连大学先进设计与智能计算省部共建教育部重点实验室 goodzls@126.com 
毛嘉莹 大连大学先进设计与智能计算省部共建教育部重点实验室 j_y_maomao@163.com 
摘要点击次数: 786
全文下载次数: 290
查看全文   查看/发表评论  下载PDF阅读器
关闭