Page 92 - 《应用声学》2025年第3期
P. 92
626 2025 年 5 月
Tao Yuang. Application of MFCC feature training tech- and spark[C]//2018 International Conference on Intelli-
nology in voiceprint recognition[J]. Application of IC, gent Autonomous Systems (ICoIAS). March 1–3, 2018.
2024, 41(2): 386–387. Singapore. IEEE, 2018.
[28] 李磊, 朱永同, 杨琦, 等. 基于多任务学习与注意力机制的 [35] Yu Y, Luo S, Liu S L, et al. Deep attention based music
多层次音频特征情感识别研究 [J]. 智能计算机与应用, 2024, genre classification[J]. Neurocomputing, 2020, 372: 84–91.
14(1): 85–94, 101. [36] 连子宽, 姚力, 刘晟源, 等. 基于 t-SNE 降维和 BIRCH 聚类的
Li Lei, Zhu Yongtong, Yang Qi, et al. Multilevel emotion
单相用户相位及表箱辨识 [J]. 电力系统自动化, 2020, 44(8):
recognition of audio features based on multitask learn-
176–184.
ing and attention mechanism[J]. Intelligent Computer and
Lian Zikuan, Yao Li, Liu Shengyuan, et al. Phase and
Applications, 2024, 14(1): 85–94, 101. meter box identification for single-phase users based on
[29] 杨蕊檄. 基于时频特征信息的声学事件检测算法研究 [D]. 成
t-SNE dimension reduction and BIRCH clustering[J]. Au-
都: 西南交通大学, 2019.
tomation of Electric Power Systems, 2020, 44(8): 176–184.
[30] Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is
[37] Peng N, Chen A B, Zhou G X, et al. Environment
worth 16×16 words: Transformers for image recognition
sound classification based on visual multi-feature fusion
at scale[J]. arXiv Preprint, arXiv: 2010.11929, 2020.
and GRU-AWS[J]. IEEE Access, 2020, 8: 191100–191114.
[31] 余正涛, 董凌, 高盛祥. 低资源语音识别研究进展 [J]. 昆明理
[38] Zhu W J, Li X. Speech emotion recognition with
工大学学报 (自然科学版), 2024, 49(3): 86–102.
global-aware fusion on multi-scale feature representa-
Yu Zhengtao, Dong Ling, Gao Shengxiang. Research
tion[C]//ICASSP 2022—2022 IEEE International Con-
progress of low-resource speech recognition[J]. Journal of
ference on Acoustics, Speech and Signal Processing
Kunming University of Science and Technology (Natural
(ICASSP). May 23–27, 2022. Singapore, IEEE, 2022.
Science), 2024, 49(3): 86–102.
[32] 殷铭旸, 乔亦诚, 张德霄龙, 等. 基于风格迁移的数据增强方 [39] Kong Q Q, Cao Y, Iqbal T, et al. PANNs: Large-scale
法 [J]. 信息技术与信息化, 2023(12): 127–130. pretrained audio neural networks for audio pattern recog-
[33] Zhang P J, Zheng X Q, Zhang W Q, et al. A deep neu- nition[J]. IEEE/ACM Transactions on Audio, Speech and
ral network for modeling music[C]//Proceedings of the Language Processing, 2020, 28: 2880–2894.
5th ACM on International Conference on Multimedia Re- [40] Liu Y L, Chen A B, Zhou G X, et al. Combined
trieval. Shanghai China. ACM, 2015. CNN LSTM with attention for speech emotion recognition
[34] Karunakaran N, Arya A. A scalable hybrid classifier for based on feature-level fusion[J]. Multimedia Tools and Ap-
music genre classification using machine learning concepts plications, 2024, 83(21): 59839–59859.