Page 79 - 《应用声学》2023年第2期

P. 79

第 42 卷第 2 期郑凯桐等：房间脉冲响应模拟法及混响时间盲估计应用 267

表 4 混合数据、真实数据与 GAN 模拟数据训练的混响时间盲估计模型在不同信噪比下的估计性能
Table 4 Experimental results of three estimation models trained by mix data, real
data and simulated data in noisy reverberant scenarios

评价指标 RMSE/ms ρ
信噪比/dB 0 5 10 15 20 Avg. 0 5 10 15 20 Avg.
真实数据 291 287 267 272 258 275 0.826 0.870 0.889 0.89 0.897 0.874
混合数据 281 200 148 138 123 185 0.827 0.917 0.953 0.963 0.972 0.918
GAN 197 165 155 146 139 160 0.910 0.938 0.941 0.946 0.950 0.937

从图 7 中可以看出，使用真实数据训练的估计因而导致数据驱动的混响时间盲估计模型性能下
模型由于缺少长混响数据，在长混响情况下 (房间降。本文提出基于条件生成对抗网络的RIR模拟方
4) 性能不佳；而通过 GAN 对真实数据进行增广后法，使网络能够根据输入的混响时间模拟更真实的
的混合数据训练的估计模型在长混响情况下相较 RIR。实验结果表明，采用本方法模拟的 RIR 训练
未增广时性能大幅度提升。同时，由于混合数据中的盲混响时间估计模型在不同信噪比场景下均具
存在真实数据，混合数据在中等混响情况下 (房间有最小的均方根估计误差，且在长混响场景下显著
2、房间 3) 性能比全部使用GAN 模拟的方法具有更优于其他模型。该方法可以用于 RIR 增广，以扩展

小的偏差和方差；全部使用 GAN 模拟的数据在短混响语声数据集。
混响 (房间 1) 和长混响情况下具有更小的偏差和方
差。通过在不同信噪比和房间下的性能对比，可以参考文献
发现在高信噪比和中等混响条件下，使用混合数据
进行训练的网络相比全部使用 GAN 模拟的网络具 [1] Kuttruﬀ H. Room acoustics[M]. (vol. 6). Boca Raton:
CRC Press, 2016.
有更优的性能。在各种信噪比和长混响条件下，使
[2] Bradley J S. Speech intelligibility studies in classrooms[J].
用混合数据进行训练的网络相比使用真实数据的 The Journal of the Acoustical Society of America, 1986,
网络有明显的性能提升。 80(3): 846–854.
[3] Schroeder M R. New method of measuring reverberation
time[J]. The Journal of the Acoustical Society of America,
400
1965, 37(6): 1187–1188.
200 0 [4] Cox T J, Li F, Darlington P. Extracting room reverbera-
ͥᝠឨࣀ/ms -200 Journal of the Audio Engineering Society, 2001, 49(4):
tion time from speech using artiﬁcial neural networks[J].
219–230.
-400
ຉՌ஝૶
-600 ᄾࠄ஝૶ [5] Jones D L, Wheeler B C, O’Brien Jr W D, et al. Blind
ၷੇࠫઈᎪፏ estimation of reverberation time[J]. The Journal of the
-800 Acoustical Society of America, 2003, 114(5): 2877–2892.
੝ᫎ1 ੝ᫎ2 ੝ᫎ3 ੝ᫎ4 [6] Wen J Y, Habets E A, Naylor P A. Blind estimation of re-
੝ᫎ
verberation time based on the distribution of signal decay
图 7 3 种方法训练的混响时间盲估计模型在不同 rates[C]//2008 IEEE International Conference on Acous-
房间中的估计误差箱线图。房间的尺寸与声学参数 tics, Speech and Signal Processing, 2008: 329–332.
见表 1 [7] de Prego T M, de Lima A A, Zambrano-López R, et
al. Blind estimators for reverberation time and direct-to-
Fig. 7 Estimation errors of three methods and
reverberant energy ratio using subband speech decompo-
baselines in diﬀerent rooms. The details of the sition[C]//2015 IEEE workshop on applications of signal
room conﬁguration are shown in Table 1 processing to audio and acoustics (WASPAA), 2015: 1–5.
[8] Eaton J, Gaubitch N D, Moore A H, et al. The ACE
4 结论 challenge—Corpus description and performance evalua-
tion[C]//2015 IEEE Workshop on Applications of Signal
Processing to Audio and Acoustics (WASPAA), 2015: 1–5.
在构建混响语声数据集时，由于真实的 RIR
[9] Xiong F, Goetze S, Meyer B T. Joint estimation of re-
缺乏长混响数据，且模拟的 RIR 与真实存在差距， verberation time and direct-to-reverberation ratio from

74 75 76 77 78 79 80 81 82 83 84