计算机科学 ›› 2020, Vol. 47 ›› Issue (8): 49-55.doi: 10.11896/jsjkx.190900202

所属专题: 高性能计算

• 高性能计算 • 上一篇    下一篇

面向语音分离的深层转导式非负矩阵分解并行算法

李雨蓉1, 刘杰1, 2, 刘亚林1, 龚春叶1, 王勇1   

  1. 1 国防科技大学并行与分布处理重点实验室 长沙410073
    2 国防科技大学复杂系统软件工程湖南省重点实验室 长沙410073

Parallel Algorithm of Deep Transductive Non-negative Matrix Factorization for Speech Separation

LI Yu-rong1, LIU Jie1, 2, LIU Ya-lin1, GONG Chun-ye1, WANG Yong1   

  1. 1 Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha 410073, China
    2 Laboratory of Software Engineering for Complex Systems, National University of Defense Technology, Changsha 410073, China
  • Online:2020-08-15 Published:2020-08-10
  • About author:LI Yu-rong, born in 1994, postgra-duate.Her main research interests include DTNMF parallel methods and high performance computing.
    LIU Jie, born in 1969, professor, Ph.D supervisor.His main research interests include parallel algorithm and high performance computing.
  • Supported by:
    This work was supported by the National Key Research and Development Program of China(2018YFB0204301).

摘要: 非负矩阵分解(Non-negative Matrix Factorization, NMF)能保存语音信号的非负特征, 是用于语音分离的重要方法, 但该方法存在数据运算复杂、计算量太大的问题, 需要研究能减少计算时间的并行计算方法。针对语音分离预训练及分离过程的计算问题, 文中提出深层转导式非负矩阵分解并行算法, 综合考虑迭代更新过程的数据关联性, 设计了一种任务间和任务内多级并行算法。该并行算法在任务级将分解训练语音得到对应基矩阵的过程作为两个独立的任务进行并行计算;在任务内部进程级把矩阵按行列划分, 主进程把矩阵块分发到从进程, 从进程接收当前矩阵块并计算结果矩阵子块, 然后将当前进程矩阵块发送到下一进程, 实现第二个矩阵中每一个矩阵块在所有进程的遍历, 并计算结果矩阵对应子块的乘积, 最后由主进程收集从进程数据块;在线程级子矩阵乘法运算的过程中, 采取生成多线程, 通过共享内存交换数据计算子矩阵块的加速策略。该算法为首个实现深层转导式非负矩阵分解的并行算法。在天河二号平台上的测试结果表明, 在分离多说话人混合语音信号时, 相比串行程序, 所提出的并行算法能在不改变分离效果的前提下, 使得预训练过程中使用64个进程的加速比为18, 分离过程使用64个进程的对应加速比为24。相较于串行及MPI模型分离, 混合模型分离时间大大缩短, 从而证明了设计的并行算法可有效提高语音分离的效率。

关键词: 乘性迭代更新规则加速算法, 共享存储并行编程, 深层转导式非负矩阵分解并行算法, 消息传递接口, 语音分离

Abstract: Non-negative matrix factorization can preserve the non-negative features of the speech signal.It is an important method for speech separation.However, this method has the problems of complicated data operation and computation, it is necessary to propose a parallel method to reduce computation time.Aiming at the calculation problem of speech separation pre-training and separation process, this paper proposes a deep transductive non-negative matrix factorization multi-level parallel algorithm, which considers the data correlation of the iterative update process, and designs a multi-level parallel algorithm between tasks and within tasks.The parallel algorithm at the task level decomposes the training speech to obtain the corresponding base matrix as two independent tasks in parallel calculation.The matrix is divided into rows and columns at the internal process level of the task.The master process distributes the matrix blocks to the slave process, and the slave process receives the current sub-matrix, then the matrix block calculates the result matrix sub-block, and then sends the current process matrix block to the next process to achieve the traversal of each matrix block of the second matrix in all processes, calculating the product of the corresponding sub-block of the result matrix, and finally the sub-block is collected by the main process from the slave process.During the thread-level sub-matrix multiplication operation, an acceleration strategy of generating multiple threads and exchanging data through shared me-mory for sub-matrix block calculation is adopted.This algorithm is the first one to implement deep transduction non-negative matrix factorization algorithm.The experimentis performed on the Tianhe II platform.The test results show that when separating multi-speaker mixed speech signals without changing the separation effect, in the pre-training process, the proposed parallel algorithm performs well using 64 processes at a speed ratio of 18, and in the separation process, the corresponding speedup is 24.Compared to serial and MPI model separation, hybrid model separation time is greatly shortened, which proves that for the speech separation process, the parallel algorithm proposed in this paper can effectively improve the separation efficiency.

Key words: Acceleration algorithm based on multiplicative update rules, MPI, OpenMP, Parallel algorithm of deep-transductive non-negative matrix factorization, Speech separation

中图分类号: 

  • TP391
[1] LEE D D, SEUNG H S.Algorithms for non-negative matrix factorization[C]∥NIPS.2001:556-562.
[2] LEE D D, SEUNG H S.Learning the parts of objects by nonnegative matrix factorization[C]∥Nature.1999, 401:788-791.
[3] ANDRZEJ C, HUY P A, RAFAL Z, et al.Nonnegative matrix and tensor factorizations applications to exploratory multi-way data analysis and blind source separation[M].Wiley Publishing, 2009.
[4] GEMULLA R, NIJKAMP E, HAAS P J, et al.Large-scale matrix factorization with distributed stochastic gradient descent[C]∥Proceedings of the KDD.ACM, 2011:69-77.
[5] KIM J, PARK H.Fast Nonnegative Matrix Factorization:AnActive-Set-Like Method and Comparisons[J].SIAM Journal on Scientific Computing, 2011, 33(6):3261-3281.
[6] DONG C, ZHAO H, WANG W.Parallel Nonnegative MatrixFactorization Algorithm on the Distributed Memory Platform[J].International Journal of Parallel Programming, 2010, 38(2):117-137.
[7] LIU C, YANG H C, FAN J, et al.Distributed nonnegative matrix factorization for web-scale dyadic data analysis on map-reduce[C]∥Proceedings of the 19th International Conference on World Wide Web.ACM, 2010:681-690.
[8] KANJANI K.Parallel Non Negative Matrix Factorization fordocument clustering[Z].CPSC-659 Spring 2007 Course Project:Texas A&M University, 2007.
[9] LOPES N, RIBEIRO B.Non-negative Matrix Factorization.Implementation using Graphics Processing Units[C]∥Internatio-nal Conference on Intelligent Data Engineering & Automated Learning.Springer-Verlag, 2010.
[10]KANNAN R, BALLARD G, PARK H.HPC-NMF:A High-Performance Parallel Algorithm for Nonnegative Matrix Factorization [J].arXiv:1509.09313.
[11]ROBILA S A, MACIAK L G.A parallel unmixing algorithm for
hyperspectral images[C]∥Optics East.International Society for Optics and Photonics, 2006.
[12]MOON G E, SUKUMARAN-RAJAM A, PARTHASARATHY S, et al.PL-NMF:Parallel Locality-Optimized Non-negative Matrix Factorization[J].arXiv:1904.07935, 2109.
[13]MEJA-ROA E, TABAS-MADRID D, SETOAIN J, et al.NMF-mGPU:non-negativematrix factorization nonmulti-GPU systems[J].BMC Bioinfor-matics, 2015, 16(1):43.
[14]LIU Y L.Research on Key Technologies of Speech Separation and Speech Recognition [D].Changsha:National University of Defense Technology, 2018.
[15]CHEN X H, XIE P Z, CHI LH, et al.An efficient SIMD compression format for sparse matrix-vector multiplication[J].Concurrency Computat Pract Exper, 2018, 30:e4800.
[16]MOHAMMADIHA N, SMARAGDIS P, LEIJON A.Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization[J].IEEE Transactions on Audio Speech & Language Processing, 2013, 21(10):2140-2151.
[17]GUAN N, LAN L, TAO D, et al.Transductive nonnegative matrix factorization for semi-supervised high-performance speech separation[C]∥IEEE International Conference on Acoustics, Speech and Signal Processing.IEEE, 2014:2534-2538.
[18]LIU Y, GUAN N, LIU J.Deep Transductive Nonnegative Matrix Factorization for Speech Separation[C]∥2017 16th IEEE International Conference on Machine Learning and Applications(ICMLA).IEEE, 2017.
[19]ZHAO Y H, CHI X B.MPI+OpenMP hybrid programmingmodel based on SMP cluster and effective implementation [J].Microelectronics and Computer, 2005, 22(10):7-11.
[20]GU K H, HUANG M, HE J Y.Research on MPI+OpenMP hybrid programming model based on multi-core cluster[J].Gansu Techenology, 2018, 34(19):10-14.
[21]XU C, DENG X, ZHANG L, et al.Collaborating CPU and GPU for large-scale high-order CFD simulations with complex grids on the TianHe-1A supercomputer[J].Journal of Computational Physics, 2014, 278:275-297.
[22]Intel Math Kernel Library[EB/OL]https://software.intel.com /en-us/mkl-developer-reference-c-blas-level-3-routines.
[1] 朱超, 吴素萍.
并行Harris特征点检测算法
Parallel Harris Feature Point Detection Algorithm
计算机科学, 2019, 46(11A): 289-293.
[2] 唐兵,贺海武.
基于树型结构的MapReduce并行模型
MapReduce Parallel Model Based on Tree Structure
计算机科学, 2015, 42(11): 65-67. https://doi.org/10.11896/j.issn.1002-137X.2015.11.013
[3] 王文义,梁福广.
并行系统中时间偏移机制的典型应用算法分析
Analysis on Time Shift Mechanism's Typical Application Algorithm in Parallel System
计算机科学, 2012, 39(2): 311-313.
[4] .
基于OpenMP的事务存储同步语义研究

计算机科学, 2009, 36(5): 166-168.
[5] 邵飞 邸瑞华.
贸易地图生成软件并行处理方案的研究与实现

计算机科学, 2008, 35(3): 267-270.
[6] 魏兵海.
MPI语言绑定:MPI-Delphi,MPI-Java与MPI-Ruby

计算机科学, 2004, 31(8): 185-189.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!