Page 213 - 《应用声学)》2023年第5期
P. 213
第 42 卷 第 5 期 罗宇等: 一种基于聚类的门控卷积网络语声分离方法 1105
IEEE/ACM transactions on Audio, Speech, and Language 研究现状与展望 [J]. 自动化学报, 2019, 45(2): 234–251.
Processing, 2019, 27(8): 1256–1266. Huang Yating, Shi Jing, Xu Jiaming, et al. Research ad-
[9] 刘文举, 聂帅, 梁山, 等. 基于深度学习语音分离技术的研究 vances and perspectives on the cocktail party problem and
现状与进展 [J]. 自动化学报, 2016, 42(6): 819–833. related auditory models[J]. Acta Automatica Sinica, 2019,
Liu Wenju, Nie Shuai, Liang Shan, et al. Deep learn- 45(2): 234–251.
ing based speech separation technology and its develop- [15] Bahmaninezhad F, Zhang S X, Xu Y, et al. A unified
ments[J]. Acta Automatica Sinica, 2016, 42(6): 819–833. framework for speech separation[J]. arXiv Preprint, arXiv:
[10] Lea C, Vidal R, Reiter A, et al. Temporal convolu- 1912.07814, 2019.
tional networks: a unified approach to action segmen- [16] 刘航, 李扬, 袁浩期, 等. 基于生成对抗网络的语音信号分
tation[C]//European Conference on Computer Vision. 离 [J]. 计算机工程, 2020, 46(1): 302–308.
Springer, Cham, 2016: 47–54. Liu Hang, Li Yang, Yuan Haoqi, et al. Speech signal sep-
[11] Dauphin Y N, Fan A, Auli M, et al. Language model- aration based on generative adversarial networks[J]. Com-
ing with gated convolutional networks[C]//International puter Engineering, 2020, 46(1): 302–308.
Conference on Machine Learning. PMLR, 2017: 933–941. [17] Kingma D P, Ba J. Adam: a method for stochastic opti-
[12] 郝敏, 刘航, 李扬, 等. 基于聚类分析与说话人识别的语音跟 mization[J]. arXiv Preprint, arXiv: 1412.6980, 2014.
踪 [J]. 计算机与现代化, 2020(4): 7–13. [18] Le Roux J, Wisdom S, Erdogan H, et al. SDR–half-baked
Hao Min, Liu Hang, Li Yang, et al. Speech tracking based or well done?[C]//ICASSP 2019-2019 IEEE International
on cluster analysis and speaker recognition[J]. Computer Conference on Acoustics, Speech and Signal Processing
and Modernization, 2020(4): 7–13. (ICASSP). IEEE, 2019: 626–630.
[13] Han C, O’Sullivan J, Luo Y, et al. Speaker-independent [19] Gu W, Tandon A, Ahn Y Y, et al. Principled approach to
auditory attention decoding without access to clean the selection of the embedding dimension of networks[J].
speech sources[J]. Science Advances, 2019, 5(5): eaav6134. Nature Communications, 2021, 12(1): 1–10.
[14] 黄雅婷, 石晶, 许家铭, 等. 鸡尾酒会问题与相关听觉模型的