计算机科学 ›› 2020, Vol. 47 ›› Issue (8): 49-55.doi: 10.11896/jsjkx.190900202
所属专题: 高性能计算
李雨蓉1, 刘杰1, 2, 刘亚林1, 龚春叶1, 王勇1
LI Yu-rong1, LIU Jie1, 2, LIU Ya-lin1, GONG Chun-ye1, WANG Yong1
摘要: 非负矩阵分解(Non-negative Matrix Factorization, NMF)能保存语音信号的非负特征, 是用于语音分离的重要方法, 但该方法存在数据运算复杂、计算量太大的问题, 需要研究能减少计算时间的并行计算方法。针对语音分离预训练及分离过程的计算问题, 文中提出深层转导式非负矩阵分解并行算法, 综合考虑迭代更新过程的数据关联性, 设计了一种任务间和任务内多级并行算法。该并行算法在任务级将分解训练语音得到对应基矩阵的过程作为两个独立的任务进行并行计算;在任务内部进程级把矩阵按行列划分, 主进程把矩阵块分发到从进程, 从进程接收当前矩阵块并计算结果矩阵子块, 然后将当前进程矩阵块发送到下一进程, 实现第二个矩阵中每一个矩阵块在所有进程的遍历, 并计算结果矩阵对应子块的乘积, 最后由主进程收集从进程数据块;在线程级子矩阵乘法运算的过程中, 采取生成多线程, 通过共享内存交换数据计算子矩阵块的加速策略。该算法为首个实现深层转导式非负矩阵分解的并行算法。在天河二号平台上的测试结果表明, 在分离多说话人混合语音信号时, 相比串行程序, 所提出的并行算法能在不改变分离效果的前提下, 使得预训练过程中使用64个进程的加速比为18, 分离过程使用64个进程的对应加速比为24。相较于串行及MPI模型分离, 混合模型分离时间大大缩短, 从而证明了设计的并行算法可有效提高语音分离的效率。
中图分类号:
[1] LEE D D, SEUNG H S.Algorithms for non-negative matrix factorization[C]∥NIPS.2001:556-562. [2] LEE D D, SEUNG H S.Learning the parts of objects by nonnegative matrix factorization[C]∥Nature.1999, 401:788-791. [3] ANDRZEJ C, HUY P A, RAFAL Z, et al.Nonnegative matrix and tensor factorizations applications to exploratory multi-way data analysis and blind source separation[M].Wiley Publishing, 2009. [4] GEMULLA R, NIJKAMP E, HAAS P J, et al.Large-scale matrix factorization with distributed stochastic gradient descent[C]∥Proceedings of the KDD.ACM, 2011:69-77. [5] KIM J, PARK H.Fast Nonnegative Matrix Factorization:AnActive-Set-Like Method and Comparisons[J].SIAM Journal on Scientific Computing, 2011, 33(6):3261-3281. [6] DONG C, ZHAO H, WANG W.Parallel Nonnegative MatrixFactorization Algorithm on the Distributed Memory Platform[J].International Journal of Parallel Programming, 2010, 38(2):117-137. [7] LIU C, YANG H C, FAN J, et al.Distributed nonnegative matrix factorization for web-scale dyadic data analysis on map-reduce[C]∥Proceedings of the 19th International Conference on World Wide Web.ACM, 2010:681-690. [8] KANJANI K.Parallel Non Negative Matrix Factorization fordocument clustering[Z].CPSC-659 Spring 2007 Course Project:Texas A&M University, 2007. [9] LOPES N, RIBEIRO B.Non-negative Matrix Factorization.Implementation using Graphics Processing Units[C]∥Internatio-nal Conference on Intelligent Data Engineering & Automated Learning.Springer-Verlag, 2010. [10]KANNAN R, BALLARD G, PARK H.HPC-NMF:A High-Performance Parallel Algorithm for Nonnegative Matrix Factorization [J].arXiv:1509.09313. [11]ROBILA S A, MACIAK L G.A parallel unmixing algorithm for hyperspectral images[C]∥Optics East.International Society for Optics and Photonics, 2006. [12]MOON G E, SUKUMARAN-RAJAM A, PARTHASARATHY S, et al.PL-NMF:Parallel Locality-Optimized Non-negative Matrix Factorization[J].arXiv:1904.07935, 2109. [13]MEJA-ROA E, TABAS-MADRID D, SETOAIN J, et al.NMF-mGPU:non-negativematrix factorization nonmulti-GPU systems[J].BMC Bioinfor-matics, 2015, 16(1):43. [14]LIU Y L.Research on Key Technologies of Speech Separation and Speech Recognition [D].Changsha:National University of Defense Technology, 2018. [15]CHEN X H, XIE P Z, CHI LH, et al.An efficient SIMD compression format for sparse matrix-vector multiplication[J].Concurrency Computat Pract Exper, 2018, 30:e4800. [16]MOHAMMADIHA N, SMARAGDIS P, LEIJON A.Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization[J].IEEE Transactions on Audio Speech & Language Processing, 2013, 21(10):2140-2151. [17]GUAN N, LAN L, TAO D, et al.Transductive nonnegative matrix factorization for semi-supervised high-performance speech separation[C]∥IEEE International Conference on Acoustics, Speech and Signal Processing.IEEE, 2014:2534-2538. [18]LIU Y, GUAN N, LIU J.Deep Transductive Nonnegative Matrix Factorization for Speech Separation[C]∥2017 16th IEEE International Conference on Machine Learning and Applications(ICMLA).IEEE, 2017. [19]ZHAO Y H, CHI X B.MPI+OpenMP hybrid programmingmodel based on SMP cluster and effective implementation [J].Microelectronics and Computer, 2005, 22(10):7-11. [20]GU K H, HUANG M, HE J Y.Research on MPI+OpenMP hybrid programming model based on multi-core cluster[J].Gansu Techenology, 2018, 34(19):10-14. [21]XU C, DENG X, ZHANG L, et al.Collaborating CPU and GPU for large-scale high-order CFD simulations with complex grids on the TianHe-1A supercomputer[J].Journal of Computational Physics, 2014, 278:275-297. [22]Intel Math Kernel Library[EB/OL]https://software.intel.com /en-us/mkl-developer-reference-c-blas-level-3-routines. |
[1] | 朱超, 吴素萍. 并行Harris特征点检测算法 Parallel Harris Feature Point Detection Algorithm 计算机科学, 2019, 46(11A): 289-293. |
[2] | 唐兵,贺海武. 基于树型结构的MapReduce并行模型 MapReduce Parallel Model Based on Tree Structure 计算机科学, 2015, 42(11): 65-67. https://doi.org/10.11896/j.issn.1002-137X.2015.11.013 |
[3] | 王文义,梁福广. 并行系统中时间偏移机制的典型应用算法分析 Analysis on Time Shift Mechanism's Typical Application Algorithm in Parallel System 计算机科学, 2012, 39(2): 311-313. |
[4] | . 基于OpenMP的事务存储同步语义研究 计算机科学, 2009, 36(5): 166-168. |
[5] | 邵飞 邸瑞华. 贸易地图生成软件并行处理方案的研究与实现 计算机科学, 2008, 35(3): 267-270. |
[6] | 魏兵海. MPI语言绑定:MPI-Delphi,MPI-Java与MPI-Ruby 计算机科学, 2004, 31(8): 185-189. |
|