Page 207 - 《应用声学》2023年第4期
P. 207
第 42 卷 第 4 期 孙晓川等: 应用 ResNet 和 CatBoost 检测重放语声 869
大。另外,表中也可以看出,录声设备与说话人的距 [2] Wu Z, Evans N, Kinnunen T, et al. Spoofing and coun-
离越近,重放语声检测的准确率越低。上述实验结 termeasures for speaker verification: a survey[J]. Speech
Communication, 2015, 66: 130–153.
果说明目标说话人的声音被近距离录制且用高质
[3] Jung J, Shim H, Heo H S, et al. Replay attack de-
量重放设备重放后,引入的卷积和加性噪声相应的 tection with complementary high-resolution information
减少,加大了重放语声检测的难度。最后,表中也能 using end-to-end DNN for the ASVspoof 2019 Chal-
lenge[C]//Conference of the International Speech Com-
看出本文方法对重放设备质量和距离的敏感性弱
munication Association, 2019: 1083–1087.
于基线系统,这表明了本文方法具有一定实用性。 [4] Ji Z, Li Z Y, Li P, et al. Ensemble learning
for countermeasure of audio replay spoofing attack
表 9 在不同重放攻击类型下准确率 in ASVspoof2017[C]//Conference of the International
Speech Communication Association, 2017: 87–91.
Table 9 Accuracy rate under different re-
[5] Ahmed M E, Kwak I Y, Huh J H, et al. Void: a fast and
play attack types
light voice liveness detection system[C]//USENIX Confer-
(单位: %) ence on Security Symposium, 2020: 2685–2702.
[6] Wang H, Dinkel H, Wang S, et al. Dual-adversarial
重放攻击类型 CQCC+GMM GFCC+ResNet+CatBoost domain adaptation for generalized replay attack detec-
AA 74.05 88.84 tion[C]//Conference of the International Speech Commu-
AB 93.72 95.03 nication Association, 2020: 1086–1090.
[7] Zhang L, Tan S, Yang J. Hearing your voice is not enough:
AC 98.01 99.73
an articulatory gesture based liveness detection for voice
BA 78.07 93.49 authentication[C]//ACM SIGSAC Conference on Com-
BB 94.79 95.98 puter and Communications Security, 2017: 57–71.
BC 98.55 99.61 [8] Sahidullah M, Thomsen D A L, Hautamäki R G, et al. Ro-
bust voice liveness detection and speaker verification using
CA 78.74 93.52
throat microphones[J]. IEEE/ACM Transactions on Au-
CB 95.11 97.11
dio, Speech, and Language Processing, 2017, 26(1): 44–56.
CC 98.07 99.36 [9] Chen S, Ren K, Piao S, et al. You can hear but you cannot
steal: defending against voice impersonation attacks on
4 结论 smartphones[C]//International Conference on Distributed
Computing Systems, 2017: 183–195.
[10] Shiota S, Villavicencio F, Yamagishi J, et al. Voice
本文通过 ResNet 和 CatBoost 的融合,提出了
liveness detection algorithms based on pop noise
一种新的重放语声检测方法。首先,在本文提出 caused by human breath for automatic speaker verifica-
的声频帧选择方法中,通过 STFT、LFAE 计算和帧 tion[C]//Conference of the International Speech Commu-
nication Association, 2015: 239–243.
排序对的语声进行预处理。其次,计算这些帧的低
[11] Shiota S, Villavicencio F, Yamagishi J, et al. Voice live-
频 GFCC 声学特征。在此基础上,通过基于自注意 ness detection for speaker verification based on a tandem
机制ResNet进一步提取GFCC特征中的特定信息。 single/double-channel pop noise detector[C]//Odyssey:
The Speaker and Language Recognition Workshop, 2016:
最后,将提取出的特征用于 CatBoost 训练和分类,
259–263.
从而达到更好的检测效果。通过对比实验结果说明 [12] Mochizuki S, Shiota S, Kiya H. Voice liveness detection
了该方案的有效性。此外,本文还研究了性别、词 using phoneme-based pop-noise detector for speaker ver-
ifcation[C]//Odyssey: The Speaker and Language Recog-
汇、语声帧选择方法、频率范围、录制距离和重放设
nition Workshop, 2018: 233–239.
备的质量对实验结果的影响。未来的工作中将提出 [13] Wang Q, Lin X, Zhou M, et al. Voicepop: a pop noise
一种更有效的基于不同性别的重放语声检测方法。 based anti-spoofing system for voice authentication on
smartphones[C]//IEEE Conference on Computer Com-
munications, 2019: 2062–2070.
[14] Jiang P, Wang Q, Lin X, et al. Securing liveness detection
参 考 文 献
for voice authentication via pop noises[J]. IEEE Transac-
tions on Dependable and Secure Computing, 2022.
[1] Delac K, Grgic M. A survey of biometric recognition meth- [15] Akimoto K, Liew S P, Mishima S, et al. POCO: a
ods[C]//International Symposium on Electronics in Ma- voice spoofing and liveness detection corpus based on pop
rine, 2004: 184–193. noise[C]//Conference of the International Speech Com-