计算机科学 ›› 2020, Vol. 47 ›› Issue (7): 71-77.doi: 10.11896/jsjkx.200200106

• 数据库&大数据&数据科学 • 上一篇    下一篇

基于预处理的超图非负矩阵分解算法

李向利1,2,3, 贾梦雪1,4   

  1. 1 桂林电子科技大学数学与计算科学学院 广西 桂林541004
    2 广西密码学与信息安全重点实验室 广西 桂林541004
    3 广西自动检测技术与仪器重点实验室 广西 桂林541004
    4 广西高校数据分析与计算重点实验室 广西 桂林541004
  • 收稿日期:2020-01-24 出版日期:2020-07-15 发布日期:2020-07-16
  • 通讯作者: 李向利([email protected])
  • 基金资助:
    国家自然科学基金(11961010,61967004),广西自然科学基金(2018GXNSFAA138169),广西密码学与信息安全重点实验室研究课题(GCIS201708),广西自动检测技术与仪器重点实验室基金(YQ19111),桂林电子科技大学研究生教育创新计划资助项目(2020YCXS087)

Nonnegative Matrix Factorization Algorithm with Hypergraph Based on Per-treatments

LI Xiang-li1,2,3, JIA Meng-xue1,4   

  1. 1 School of Mathematics & Computing Science,Guilin University of Electronic Technology,Guilin,Guangxi 541004,China
    2 Guangxi Key Laboratory of Cryptography and Information Security,Guilin,Guangxi 541004,China
    3 Guangxi Key Laboratory of Automatic Testing Technology and Instrument,Guilin,Guangxi 541004,China
    4 Guangxi University Key Laboratory of Data Analysis and Calculation,Guilin,Guangxi 541004,China
  • Received:2020-01-24 Online:2020-07-15 Published:2020-07-16
  • About author:LI Xiang-li,born in 1977,Ph.D,professor.Her main research interests include image clustering,nonnegative matrix factorization,and optimization.
  • Supported by:
    This work was supported by the National Natural Science Foundation of China (11961010,61967004),Guangxi Natural Science Foundation (2018GXNSFAA138169),Guangxi Key Laboratory of Cryptography and Information Security (GCIS201708),Guangxi Key Laboratory of Automatic Testing Technology and Instruments (YQ19111) and Innovation Project of GUET Graduate Education (2020YCXS087).

摘要: 随着多媒体技术的发展,信息越来越多的以图片的形式出现。如何对海量的无标签图片进行聚类,是机器学习领域的热点问题。而图像聚类在人脸识别、手写数字识别等领域也有着重要的作用。由于图片数据通常以非负矩阵的形式存储,因此非负矩阵分解算法(NMF)在图像聚类领域得到了广泛的应用。但是NMF算法直接在数据的原始空间进行处理,这就导致NMF算法所得的图片标签易受到数据采集过程中含有的噪声等不利因素的影响。为了解决这些问题,提出了一种基于预处理的超图非负矩阵分解算法(Nonnegative Matrix Factorization with Hypergraph Based on Per-treatments,PHGNMF)。PHGNMF算法将预处理操作和超图的思想引入到NMF算法。在预处理的过程中,使用灰度处理来去除图片中不同光线条件所带来的影响,采用小波分析来提取图片的低时频子图,同时降低了算法所处理的矩阵维度。采取构建超图的方法来进一步保留对聚类结果有重要影响的数据局部结构。最后在5个主流数据集上的实验验证了PHGNMF算法相对于传统算法的有效性,结果显示聚类精度提升了2%~7%,标准互信息在部分数据集上提升了2%~5%。

关键词: 超图, 非负矩阵分解, 灰度处理, 图像聚类, 小波分析

Abstract: With the development of the media technology,more information is stored as the pictures.It is a topic problem in the machine learning field that how to distribute the right label to lots of unsigned pictures.And the image clustering has wide application on the face recognition and the handwriting number recognition field.Because the pictures are always stored as nonnegative matrices,the nonnegative matrix factorization algorithm (NMF) plays an important role in the image clustering.But the disadvantage in NMF algorithm is that the algorithm processes the data in the original data space which may produce a terrible result when the data have errors.To address this problem,the proposed algorithm is the nonnegative matrix factorization algorithm with a hypergraph based on per-treatments (PHGNMF).The PHGNMF algorithm introduces the per-treatments and the hypergraph into the NMF algorithm.In the per-treatments,the algorithm uses the grayscale normalization to eliminate the influence of the different illuminations firstly and then the algorithm can extract the low-frequency information of the pictures by the wavelet analysis.The wavelet procession could also reduce the dimensions of the data.The algorithm constructs a hypergraph for the data to save the neighboring information which has an important influence in the clustering procession.At last the results in five fundamental data sets confirm the effectiveness of the algorithm compared with fundamental algorithms.The results show the increase of accuracy is 2%~7% and the increase of normalized mutual information on some data sets is 2%~5%.

Key words: Grayscale normalization, Hypergraph, Image clustering, Nonnegative matrix factorization, Wavelet analysis

中图分类号: 

  • TP391.4
[1]BELHUMEUR P N,HESPANHA J P,KRIEGMAN D J.Eigenfaces vs,Recongnition using class specific linear projection [J].IEEE Transactions on Pattern Analysis & Machine Intelligence,1997,7:711-720.
[2]KALMAN D.A singularly valuable decomposition:the SVD of a matrix [J].The College Mathematics Journal,1996,27(1):2-23.
[3]FISHER R A.The use of multiple measurements in taxonomic problems [J].Annals of Eugenics,1936,7(2):179-188.
[4]LEE D D,SEUNG H S.Learning the parts of objects by non-negative matrix factorization [J].Nature,1999,401(6755):788.
[5]GUILLAMET D,JORDI V.Non-negative matrix factorizationfor face recognition[C]//Castellón,Spain,lecture notes in artificial intelligence.Berlin:Springer-Verlag,2002:336-344.
[6]ZHANG X,GAO H,LI G,et al.Multi-view clustering based on graph-regularized nonnegative matrix factorization for object recognition [J].Information Sciences,2018,432:463-478.
[7]ZURADA J M,ENSARI T,ASL E H,et al.Nonnegative matrix factorization and its application to pattern analysis and text mi-ning[C]//Krakow,Poland,Federated Conference on Computer Science and Information Systems.New York,IEEE,2013:11-16.
[8]GAO B,WOO W L,DLAY S S.Variational regularized 2-D nonnegative matrix factorization [J].IEEE Transactions on Neural Networks and Learning Systems,2012,23(5):703-716.
[9]CAI D,HE X,HAN J,et al.Graph regularized nonnegative matrix factorization for data representation [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2010,33(8):1548-1560.
[10]CUI G,LI X,DONG Y.Subspace clustering guided convex nonnegative matrix factorization [J].Neurocomputing,2018,292:38-48.
[11]XU W,GONG Y.Document clustering by concept factorization [C]//Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval,Association for Computing Machinery.New York,2004:202-209.
[12]HU W,CHOI K S,WANG P,et al.Convex nonnegative matrix factorization with manifold regularization [J].Neural Networks,2015,63:94-103.
[13]BABAEE M,TSOUKALAS S.Discriminative nonnegative matrix factorization for dimensionality reduction [J].Neurocomputing,2015,173:212-223.
[14]PENG X,CHEN D,XU D.Semi-supervised least squares nonnegative matrix factorization and graph-based extension [J].Neurocomputing,2018,320:98-111.
[15]WANG F,LI T,ZHANG C S.Semi-supervised clustering via matrix factorization [C]//Atlanta,Georgia,Proceedings of the 2008 SIAM International Conference on Data Mining.SIAM,America,2008:1-12.
[16]LI G,ZHANG X,ZHANG S,et al.Semi-supervised convex nonnegative matrix factorizations with graph regularized for image representation [J].Neurocomputing,2017,237:1-11.
[17]ZHOU J.Research of SWNMF with new iteration rules for facial feature extraction and recognition [J].Symmetry,2019,11(3):354.
[18]WANG W H,QIAN Y T,TANG Y Y.Hypergraph-regularized sparse NMF for hyperspectral unmixing [J].IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing,2016,9(2):681-694.
[19]ZHOU D Y,HUANG J Y,BERNHARD S.Learning with hypergraphs:clustering,classification,and embedding[C]// British,Advances in Neural Information Processing Systems 19:Proceedings of the Twentieth Annual Conference on Neural Information Processing Systems,MIT Press,2007:1601-1608.
[20]SHI J B,MALIK J.Normalized cuts and image segmentation [J].IEEE Transactions on pattern analysis and machine intelligence,2000,22(8):888-905.
[21]HE X,WANG Q,LI X L.Robust adaptive graph regularized non-negative matrix factorization [J].IEEE Access,2019,7:83101-83110.
[1] 周海榆, 张道强.
面向多中心数据的超图卷积神经网络及应用
Multi-site Hyper-graph Convolutional Neural Networks and Application
计算机科学, 2022, 49(3): 129-133. https://doi.org/10.11896/jsjkx.201100152
[2] 韩洁, 陈俊芬, 李艳, 湛泽聪.
基于自注意力的自监督深度聚类算法
Self-supervised Deep Clustering Algorithm Based on Self-attention
计算机科学, 2022, 49(3): 134-143. https://doi.org/10.11896/jsjkx.210100001
[3] 官铮, 邓扬琳, 聂仁灿.
光谱重建约束非负矩阵分解的高光谱与全色图像融合
Non-negative Matrix Factorization Based on Spectral Reconstruction Constraint for Hyperspectral and Panchromatic Image Fusion
计算机科学, 2021, 48(9): 153-159. https://doi.org/10.11896/jsjkx.200900054
[4] 陈扬, 王金亮, 夏炜, 杨颢, 朱润, 奚雪峰.
基于特征自动提取的足迹图像聚类方法
Footprint Image Clustering Method Based on Automatic Feature Extraction
计算机科学, 2021, 48(6A): 255-259. https://doi.org/10.11896/jsjkx.200900033
[5] 罗靖杰, 王永利.
ADCSM:一种细粒度汽车行驶工况模型构建方法
ADCSM:A Fine-grained Driving Cycle Model Construction Method
计算机科学, 2021, 48(6A): 289-294. https://doi.org/10.11896/jsjkx.200600019
[6] 段菲, 王慧敏, 张超.
面向数据表示的Cauchy非负矩阵分解
Cauchy Non-negative Matrix Factorization for Data Representation
计算机科学, 2021, 48(6): 96-102. https://doi.org/10.11896/jsjkx.200700195
[7] 向昌盛, 陈志刚.
面向海量数据的网络流量混沌预测模型
Chaotic Prediction Model of Network Traffic for Massive Data
计算机科学, 2021, 48(5): 289-293. https://doi.org/10.11896/jsjkx.200400056
[8] 余诗媛, 郭淑明, 黄瑞阳, 张建朋, 苏珂.
嵌套命名实体识别研究进展
Overview of Nested Named Entity Recognition
计算机科学, 2021, 48(11A): 1-10. https://doi.org/10.11896/jsjkx.201100165
[9] 李雨蓉, 刘杰, 刘亚林, 龚春叶, 王勇.
面向语音分离的深层转导式非负矩阵分解并行算法
Parallel Algorithm of Deep Transductive Non-negative Matrix Factorization for Speech Separation
计算机科学, 2020, 47(8): 49-55. https://doi.org/10.11896/jsjkx.190900202
[10] 王成章, 白晓明, 杜金栗.
图像的扩散界面无监督聚类算法
Diffuse Interface Based Unsupervised Images Clustering Algorithm
计算机科学, 2020, 47(5): 149-153. https://doi.org/10.11896/jsjkx.190300125
[11] 王丽星, 曹付元.
基于Huber损失的非负矩阵分解算法
Huber Loss Based Nonnegative Matrix Factorization Algorithm
计算机科学, 2020, 47(11): 80-87. https://doi.org/10.11896/jsjkx.190900144
[12] 周昌, 李向利, 李俏霖, 朱丹丹, 陈世莲, 蒋丽榕.
基于余弦相似度的稀疏非负矩阵分解算法
Sparse Non-negative Matrix Factorization Algorithm Based on Cosine Similarity
计算机科学, 2020, 47(10): 108-113. https://doi.org/10.11896/jsjkx.190700112
[13] 任雪婷, 赵涓涓, 强彦, Saad Abdul RAUF, 刘继华.
联合成对学习和图像聚类的无监督肺癌亚型识别
Lung Cancer Subtype Recognition with Unsupervised Learning Combining Paired Learning and Image Clustering
计算机科学, 2020, 47(10): 200-206. https://doi.org/10.11896/jsjkx.190900073
[14] 杜臻, 马立鹏, 孙国梓.
一种基于小波分析的网络流量异常检测方法
Network Traffic Anomaly Detection Based on Wavelet Analysis
计算机科学, 2019, 46(8): 178-182. https://doi.org/10.11896/j.issn.1002-137X.2019.08.029
[15] 康林瑶, 唐兵, 夏艳敏, 张黎.
基于GPU加速和非负矩阵分解的并行协同过滤推荐算法
GPU-accelerated Non-negative Matrix Factorization-based Parallel Collaborative Filtering Recommendation Algorithm
计算机科学, 2019, 46(8): 106-110. https://doi.org/10.11896/j.issn.1002-137X.2019.08.017
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!