程琳娟,彭任华,郑成诗,李晓东.人耳听觉相关代价函数深度学习单通道 语声增强算法*[J].,2022,41(4):654-666 |
人耳听觉相关代价函数深度学习单通道 语声增强算法* |
Deep learning-based single-channel speech enhancement based on human auditory related cost function |
投稿时间:2021-05-26 修订日期:2022-06-30 |
中文摘要: |
均方误差(Mean-Square Error,MSE)函数是深度学习单通道语声增强算法最常用的一种代价函数。然而,MSE误差值的大小与语声质量好坏并非完全相关。为了提高算法性能,本文在深度神经网络训练中引入了两类与人耳听觉相关的代价函数。第一类是加权欧氏距离代价函数,考虑了人耳听觉掩蔽效应;第二类是Itakura-Satio代价函数、COSH代价函数和加权似然比代价函数,强调语声谱峰的重要性,侧重于恢复干净语声谱峰信息。基于长短期记忆网络结构分析比较了两类代价函数在深度学习单通道语声增强算法中的性能,并与MSE代价函数进行对比。实验结果表明,基于加权欧式距离代价函数的深度神经网络单通道语声增强算法能够获得更好的语声质量和更低的噪声残留。 |
英文摘要: |
Mean-square error (MSE) function is one of the most commonly used cost functions in deep learning-based single-channel speech enhancement. However, the value of MSE is not completely related to speech quality. In order to improve the performance of speech enhancement algorithm, we introduce two classes of cost functions related to human auditory during training network in this paper. The first class is a weighted-Euclidean cost function, which takes auditory masking effect into account. The second class of cost functions include Itakura-Satio cost function, COSH cost function, and weighted likelihood ratio cost function, which place more emphasis on spectral peaks than spectral valleys. The performance of these perceptually motivated cost functions in single-channel speech enhancement is analysed and compared based on long short-term memory and compared with the MSE cost function. Experimental results indicate that the deep neural network-based single-channel speech enhancement with weighted-Euclidean cost function can achieve better speech quality and lower residual noise. |
DOI:10.11684/j.issn.1000-310X.2022.04.018 |
中文关键词: 语声增强 深度学习 人耳听觉 |
英文关键词: speech enhancement deep learning human auditory |
基金项目:国家自然科学基金项目(面上项目,重点项目,重大项目) |
|
摘要点击次数: 740 |
全文下载次数: 720 |
查看全文
查看/发表评论 下载PDF阅读器 |
关闭 |