Practical Lossless Federated Singular Vector Decomposition over Billion-Scale Data

Chai, Di; Wang, Leye; Zhang, Junxue; Yang, Liu; Cai, Shuowei; Chen, Kai; Yang, Qiang

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:2105.08925 (cs)

[Submitted on 19 May 2021 (v1), last revised 4 Jul 2022 (this version, v3)]

Title:Practical Lossless Federated Singular Vector Decomposition over Billion-Scale Data

Authors:Di Chai, Leye Wang, Junxue Zhang, Liu Yang, Shuowei Cai, Kai Chen, Qiang Yang

View PDF

Abstract:With the enactment of privacy-preserving regulations, e.g., GDPR, federated SVD is proposed to enable SVD-based applications over different data sources without revealing the original data. However, many SVD-based applications cannot be well supported by existing federated SVD solutions. The crux is that these solutions, adopting either differential privacy (DP) or homomorphic encryption (HE), suffer from accuracy loss caused by unremovable noise or degraded efficiency due to inflated data.
In this paper, we propose FedSVD, a practical lossless federated SVD method over billion-scale data, which can simultaneously achieve lossless accuracy and high efficiency. At the heart of FedSVD is a lossless matrix masking scheme delicately designed for SVD: 1) While adopting the masks to protect private data, FedSVD completely removes them from the final results of SVD to achieve lossless accuracy; and 2) As the masks do not inflate the data, FedSVD avoids extra computation and communication overhead during the factorization to maintain high efficiency. Experiments with real-world datasets show that FedSVD is over 10000 times faster than the HE-based method and has 10 orders of magnitude smaller error than the DP-based solution on SVD tasks. We further build and evaluate FedSVD over three real-world applications: principal components analysis (PCA), linear regression (LR), and latent semantic analysis (LSA), to show its superior performance in practice. On federated LR tasks, compared with two state-of-the-art solutions: FATE and SecureML, FedSVD-LR is 100 times faster than SecureML and 10 times faster than FATE.

Comments:	10 pages
Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC); Cryptography and Security (cs.CR)
Cite as:	arXiv:2105.08925 [cs.DC]
	(or arXiv:2105.08925v3 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.2105.08925

Submission history

From: Di Chai [view email]
[v1] Wed, 19 May 2021 04:51:12 UTC (19,921 KB)
[v2] Fri, 10 Jun 2022 09:15:00 UTC (5,071 KB)
[v3] Mon, 4 Jul 2022 01:04:31 UTC (8,980 KB)

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Practical Lossless Federated Singular Vector Decomposition over Billion-Scale Data

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Practical Lossless Federated Singular Vector Decomposition over Billion-Scale Data

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators