Page 207 - 《应用声学》2023年第4期
P. 207

第 42 卷 第 4 期               孙晓川等: 应用 ResNet 和 CatBoost 检测重放语声                                869


             大。另外,表中也可以看出,录声设备与说话人的距                             [2] Wu Z, Evans N, Kinnunen T, et al. Spoofing and coun-
             离越近,重放语声检测的准确率越低。上述实验结                                termeasures for speaker verification: a survey[J]. Speech
                                                                   Communication, 2015, 66: 130–153.
             果说明目标说话人的声音被近距离录制且用高质
                                                                 [3] Jung J, Shim H, Heo H S, et al.  Replay attack de-
             量重放设备重放后,引入的卷积和加性噪声相应的                                tection with complementary high-resolution information
             减少,加大了重放语声检测的难度。最后,表中也能                               using end-to-end DNN for the ASVspoof 2019 Chal-
                                                                   lenge[C]//Conference of the International Speech Com-
             看出本文方法对重放设备质量和距离的敏感性弱
                                                                   munication Association, 2019: 1083–1087.
             于基线系统,这表明了本文方法具有一定实用性。                              [4] Ji Z, Li Z Y, Li P, et al.  Ensemble learning
                                                                   for countermeasure of audio replay spoofing attack
                    表 9   在不同重放攻击类型下准确率                            in ASVspoof2017[C]//Conference of the International
                                                                   Speech Communication Association, 2017: 87–91.
                Table 9 Accuracy rate under different re-
                                                                 [5] Ahmed M E, Kwak I Y, Huh J H, et al. Void: a fast and
                play attack types
                                                                   light voice liveness detection system[C]//USENIX Confer-
                                                  (单位: %)          ence on Security Symposium, 2020: 2685–2702.
                                                                 [6] Wang H, Dinkel H, Wang S, et al.  Dual-adversarial
              重放攻击类型 CQCC+GMM GFCC+ResNet+CatBoost                 domain adaptation for generalized replay attack detec-
                  AA         74.05            88.84                tion[C]//Conference of the International Speech Commu-
                  AB         93.72            95.03                nication Association, 2020: 1086–1090.
                                                                 [7] Zhang L, Tan S, Yang J. Hearing your voice is not enough:
                  AC         98.01            99.73
                                                                   an articulatory gesture based liveness detection for voice
                  BA         78.07            93.49                authentication[C]//ACM SIGSAC Conference on Com-
                  BB         94.79            95.98                puter and Communications Security, 2017: 57–71.
                  BC         98.55            99.61              [8] Sahidullah M, Thomsen D A L, Hautamäki R G, et al. Ro-
                                                                   bust voice liveness detection and speaker verification using
                  CA         78.74            93.52
                                                                   throat microphones[J]. IEEE/ACM Transactions on Au-
                  CB         95.11            97.11
                                                                   dio, Speech, and Language Processing, 2017, 26(1): 44–56.
                  CC         98.07            99.36              [9] Chen S, Ren K, Piao S, et al. You can hear but you cannot
                                                                   steal: defending against voice impersonation attacks on
             4 结论                                                  smartphones[C]//International Conference on Distributed
                                                                   Computing Systems, 2017: 183–195.
                                                                [10] Shiota S, Villavicencio F, Yamagishi J, et al.  Voice
                 本文通过 ResNet 和 CatBoost 的融合,提出了
                                                                   liveness  detection  algorithms  based  on  pop  noise
             一种新的重放语声检测方法。首先,在本文提出                                 caused by human breath for automatic speaker verifica-
             的声频帧选择方法中,通过 STFT、LFAE 计算和帧                           tion[C]//Conference of the International Speech Commu-
                                                                   nication Association, 2015: 239–243.
             排序对的语声进行预处理。其次,计算这些帧的低
                                                                [11] Shiota S, Villavicencio F, Yamagishi J, et al. Voice live-
             频 GFCC 声学特征。在此基础上,通过基于自注意                             ness detection for speaker verification based on a tandem
             机制ResNet进一步提取GFCC特征中的特定信息。                            single/double-channel pop noise detector[C]//Odyssey:
                                                                   The Speaker and Language Recognition Workshop, 2016:
             最后,将提取出的特征用于 CatBoost 训练和分类,
                                                                   259–263.
             从而达到更好的检测效果。通过对比实验结果说明                             [12] Mochizuki S, Shiota S, Kiya H. Voice liveness detection
             了该方案的有效性。此外,本文还研究了性别、词                                using phoneme-based pop-noise detector for speaker ver-
                                                                   ifcation[C]//Odyssey: The Speaker and Language Recog-
             汇、语声帧选择方法、频率范围、录制距离和重放设
                                                                   nition Workshop, 2018: 233–239.
             备的质量对实验结果的影响。未来的工作中将提出                             [13] Wang Q, Lin X, Zhou M, et al. Voicepop: a pop noise
             一种更有效的基于不同性别的重放语声检测方法。                                based anti-spoofing system for voice authentication on
                                                                   smartphones[C]//IEEE Conference on Computer Com-
                                                                   munications, 2019: 2062–2070.
                                                                [14] Jiang P, Wang Q, Lin X, et al. Securing liveness detection
                            参 考     文   献
                                                                   for voice authentication via pop noises[J]. IEEE Transac-
                                                                   tions on Dependable and Secure Computing, 2022.
              [1] Delac K, Grgic M. A survey of biometric recognition meth-  [15] Akimoto K, Liew S P, Mishima S, et al.  POCO: a
                 ods[C]//International Symposium on Electronics in Ma-  voice spoofing and liveness detection corpus based on pop
                 rine, 2004: 184–193.                              noise[C]//Conference of the International Speech Com-
   202   203   204   205   206   207   208   209   210   211   212