| 涂振华,赵腊生,毛嘉莹.通道残差融合和时频注意力的说话人验证[J].,2025,44(5):1232-1241 |
| 通道残差融合和时频注意力的说话人验证 |
| Speaker verification based on channel residual fusion and time-frequency attention |
| 投稿时间:2024-05-23 修订日期:2025-08-30 |
| 中文摘要: |
| 近些年基于深度神经网络的说话人验证模型取得了显著的进展。然而,先前工作在融合频域局部特征方面仍存在局限,未能充分利用特征间的互补性,并且缺乏针对长时上下文的高效建模方法。为了解决上述的问题,提出了一种基于通道残差融合和时频注意力的说话人验证模型,采用注意力融合机制自动调整不同通道间局部特征的融合权值,增强模型对局部特征的表达能力。并且提出了一种时频混合通道注意力机制,对更远距离的帧间关系进行建模,提升了模型对长时上下文信息的捕获能力。通过在CN-Celeb数据集上进行的实验结果表明,在等错误率和最小检验代价两个指标上均优于对比模型,证明了该模型在不同说话人语境中的有效性。 |
| 英文摘要: |
| Speaker verification models based on deep neural networks have made significant progress in recent years. However, previous works still have limitations in fusing local features in the frequency domain, failing to fully exploit the complementarity between features, and lacking efficient modeling methods for long-term context. In order to solve the above problems, a speaker verification model based on channel residual fusion and time-frequency attention is proposed. The attention fusion mechanism is used to automatically adjust the fusion weights of local features between different channels and enhance the model""s expression ability of local features. And a time-frequency mixed channel attention mechanism is proposed to model longer-distance inter-frame relationships, improving the model""s ability to capture long-term contextual information. Experimental results on the CN-Celeb datasets show that it is better than the comparison model in both indicators of equal error rate and minimum detection cost function, proving the effectiveness of the model in different speaker contexts. |
| DOI:10.11684/j.issn.1000-310X.2025.05.013 |
| 中文关键词: 说话人验证 神经网络 通道残差融合 通道注意力 |
| 英文关键词: Speaker verification Neural network Channel residual fusion Channel attention |
| 基金项目:辽宁省教育厅基本科研项目;111计划项目;大连市科技创新基金计划项目 |
|
| 摘要点击次数: 786 |
| 全文下载次数: 290 |
|
查看全文
查看/发表评论 下载PDF阅读器 |
| 关闭 |