             ⋄ 研究报告 ⋄

                  一种基于聚类的门控卷积网络语声分离方法                                                                ∗

                                             罗 宇 胡维平            †    吴华楠

                                             (广西师范大学电子工程学院         桂林  541000)

                中图法分类号: TN912.3           文献标识码: A          文章编号: 1000-310X(2023)05-1099-07
                DOI: 10.11684/j.issn.1000-310X.2023.05.024

               Clustering-based speech separation method for gated convolutional networks

                                           LUO Yu    HU Weiping    WU Huanan

                                 (Electronic Engineering, Guangxi Normal University, Guilin 541000, China)

                 Abstract: Deep clustering-based speech separation methods have been shown to be effective in solving the
                 problem of speaker output label alignment in mixed speech, however, most of the existing methods on clustering
                 for speaker separation optimize the embedding to minimize the reconstruction error of each source. In this
                 paper, we design an improved gate-convolutional cluster speech separation method based on the time-domain
                 convolutional network as the base network. The framework uses nonlinear gated activation in time-domain
                 convolutional networks to extract deep features of speech signals; and clustering in a high-dimensional feature
                 space to represent and segment the features of speech signals, providing a long-term speaker representation
                 information for recovering different sources. The framework solves the speaker output label alignment problem
                 and models the long-term dependency of speech signals. Experiments with the Wall Street Journal dataset yield
                 that the method achieves 16.72 dB and 16.33 dB in the signal distortion ratio and scale invariant signal-to-noise
                 ratio metrics, respectively.
                 Keywords: Deep clustering; Gated convolution; Speech separation

             2022-08-16 收稿; 2022-09-20 定稿
             国家自然科学基金项目 (NSFC 61861005)
             作者简介: 罗宇 (1999– ), 男, 江西吉安人, 硕士研究生, 研究方向: 语声信号处理。
             † 通信作者 E-mail:
