Related papers: Federated PCA and Estimation for Spiked Covariance Matrices: Optimal Rates and Efficient Algorithm

Federated PCA and Estimation for Spiked Covariance Matrices: Optimal Rates and Efficient Algorithm

URL: http://arxiv.org/abs/2411.15660v1
Date: Sat, 23 Nov 2024 21:57:50 GMT
Title: Federated PCA and Estimation for Spiked Covariance Matrices: Optimal Rates and Efficient Algorithm
Authors: Jingyang Li, T. Tony Cai, Dong Xia, Anru R. Zhang,
Abstract summary: Federated Learning (FL) has gained significant recent attention in machine learning for its enhanced privacy and data security. This paper investigates federated PCA and estimation for spiked covariance matrices under distributed differential privacy constraints. We establish minimax rates of convergence, with a key finding that the central server's optimal rate is the harmonic mean of the local clients' minimax rates.
Score: 19.673557166734977
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Federated Learning (FL) has gained significant recent attention in machine learning for its enhanced privacy and data security, making it indispensable in fields such as healthcare, finance, and personalized services. This paper investigates federated PCA and estimation for spiked covariance matrices under distributed differential privacy constraints. We establish minimax rates of convergence, with a key finding that the central server's optimal rate is the harmonic mean of the local clients' minimax rates. This guarantees consistent estimation at the central server as long as at least one local client provides consistent results. Notably, consistency is maintained even if some local estimators are inconsistent, provided there are enough clients. These findings highlight the robustness and scalability of FL for reliable statistical inference under privacy constraints. To establish minimax lower bounds, we derive a matrix version of van Trees' inequality, which is of independent interest. Furthermore, we propose an efficient algorithm that preserves differential privacy while achieving near-optimal rates at the central server, up to a logarithmic factor. We address significant technical challenges in analyzing this algorithm, which involves a three-layer spectral decomposition. Numerical performance of the proposed algorithm is investigated using both simulated and real data.

Related papers

A Secure and Private Distributed Bayesian Federated Learning Design [56.92336577799572]
Distributed Federated Learning (DFL) enables decentralized model training across large-scale systems without a central parameter server.<n>DFL faces three critical challenges: privacy leakage from honest-but-curious neighbors, slow convergence due to the lack of central coordination, and vulnerability to Byzantine adversaries aiming to degrade model accuracy.<n>We propose a novel DFL framework that integrates Byzantine robustness, privacy preservation, and convergence acceleration.
arXiv Detail & Related papers (2026-02-23T16:12:02Z)
Improved High-probability Convergence Guarantees of Decentralized SGD [74.39742894097348]
We show that $mathttDSGD$ converges in HP under the same conditions on the cost as in the mean-squared error (MSE) sense.<n>Our improved analysis yields linear-up in the number of users, demonstrating that $mathttDSGD$ maintains performance in the HP sense.
arXiv Detail & Related papers (2025-10-07T17:15:08Z)
Linear-Time User-Level DP-SCO via Robust Statistics [55.350093142673316]
User-level differentially private convex optimization (DP-SCO) has garnered significant attention due to the importance of safeguarding user privacy in machine learning applications. Current methods, such as those based on differentially private gradient descent (DP-SGD), often struggle with high noise accumulation and suboptimal utility. We introduce a novel linear-time algorithm that leverages robust statistics, specifically the median and trimmed mean, to overcome these challenges.
arXiv Detail & Related papers (2025-02-13T02:05:45Z)
Private and Federated Stochastic Convex Optimization: Efficient Strategies for Centralized Systems [8.419845742978985]
This paper addresses the challenge of preserving privacy in Federated Learning (FL) within centralized systems. We devise methods that ensure Differential Privacy (DP) while maintaining optimal convergence rates for homogeneous and heterogeneous data distributions.
arXiv Detail & Related papers (2024-07-17T08:19:58Z)
Share Secrets for Privacy: Confidential Forecasting with Vertical Federated Learning [5.584904689846748]
Key challenges include data privacy and over-fitting on small and noisy datasets during both training and inference.<n>We propose Secret-shared Time Series Forecasting with VFL'' (STV), a novel framework with the following key features.<n>Results demonstrate that STV's forecasting accuracy is comparable to those of centralized approaches.
arXiv Detail & Related papers (2024-05-31T12:27:38Z)
Asynchronous Federated Stochastic Optimization for Heterogeneous Objectives Under Arbitrary Delays [0.0]
Federated learning (FL) was recently proposed to securely train models with data held over multiple locations ("clients") Two major challenges hindering the performance of FL algorithms are long training times caused by straggling clients, and a decline in model accuracy under non-iid local data distributions ("client drift") We propose and analyze Asynchronous Exact Averaging (AREA), a new (sub)gradient algorithm that utilizes communication to speed up convergence and enhance scalability, and employs client memory to correct the client drift caused by variations in client update frequencies.
arXiv Detail & Related papers (2024-05-16T14:22:49Z)
TernaryVote: Differentially Private, Communication Efficient, and Byzantine Resilient Distributed Optimization on Heterogeneous Data [50.797729676285876]
We propose TernaryVote, which combines a ternary compressor and the majority vote mechanism to realize differential privacy, gradient compression, and Byzantine resilience simultaneously. We theoretically quantify the privacy guarantee through the lens of the emerging f-differential privacy (DP) and the Byzantine resilience of the proposed algorithm.
arXiv Detail & Related papers (2024-02-16T16:41:14Z)
Privacy-preserving Federated Primal-dual Learning for Non-convex and Non-smooth Problems with Model Sparsification [51.04894019092156]
Federated learning (FL) has been recognized as a rapidly growing area, where the model is trained over clients under the FL orchestration (PS) In this paper, we propose a novel primal sparification algorithm for and guarantee non-smooth FL problems. Its unique insightful properties and its analyses are also presented.
arXiv Detail & Related papers (2023-10-30T14:15:47Z)
On the Privacy-Robustness-Utility Trilemma in Distributed Learning [7.778461949427662]
We present the first tight analysis of the error incurred by any algorithm ensuring robustness against a fraction of adversarial machines. Our analysis exhibits a fundamental trade-off between privacy, robustness, and utility.
arXiv Detail & Related papers (2023-02-09T17:24:18Z)
Differentially Private Federated Clustering over Non-IID Data [59.611244450530315]
clustering clusters (FedC) problem aims to accurately partition unlabeled data samples distributed over massive clients into finite clients under the orchestration of a server. We propose a novel FedC algorithm using differential privacy convergence technique, referred to as DP-Fed, in which partial participation and multiple clients are also considered. Various attributes of the proposed DP-Fed are obtained through theoretical analyses of privacy protection, especially for the case of non-identically and independently distributed (non-i.i.d.) data.
arXiv Detail & Related papers (2023-01-03T05:38:43Z)
Differentially Private Decentralized Optimization with Relay Communication [1.2695958417031445]
We introduce a new measure: Privacy Leakage Frequency (PLF), which reveals the relationship between communication and privacy leakage of algorithms. A novel differentially private decentralized primal--dual algorithm named DP-RECAL is proposed to take advantage of operator splitting method and relay communication mechanism to experience less PLF.
arXiv Detail & Related papers (2022-12-21T09:05:36Z)
FedSkip: Combatting Statistical Heterogeneity with Federated Skip Aggregation [95.85026305874824]
We introduce a data-driven approach called FedSkip to improve the client optima by periodically skipping federated averaging and scattering local models to the cross devices. We conduct extensive experiments on a range of datasets to demonstrate that FedSkip achieves much higher accuracy, better aggregation efficiency and competing communication efficiency.
arXiv Detail & Related papers (2022-12-14T13:57:01Z)
Decentralized Stochastic Optimization with Inherent Privacy Protection [103.62463469366557]
Decentralized optimization is the basic building block of modern collaborative machine learning, distributed estimation and control, and large-scale sensing. Since involved data, privacy protection has become an increasingly pressing need in the implementation of decentralized optimization algorithms.
arXiv Detail & Related papers (2022-05-08T14:38:23Z)
Stochastic Coded Federated Learning with Convergence and Privacy Guarantees [8.2189389638822]
Federated learning (FL) has attracted much attention as a privacy-preserving distributed machine learning framework. This paper proposes a coded federated learning framework, namely coded federated learning (SCFL) to mitigate the straggler issue. We characterize the privacy guarantee by the mutual information differential privacy (MI-DP) and analyze the convergence performance in federated learning.
arXiv Detail & Related papers (2022-01-25T04:43:29Z)
Local Learning Matters: Rethinking Data Heterogeneity in Federated Learning [61.488646649045215]
Federated learning (FL) is a promising strategy for performing privacy-preserving, distributed learning with a network of clients (i.e., edge devices)
arXiv Detail & Related papers (2021-11-28T19:03:39Z)
Minimax Estimation for Personalized Federated Learning: An Alternative between FedAvg and Local Training? [31.831856922814502]
Local datasets often originate from distinct yet not entirely unrelated probability distributions. In this paper, we show how the excess risks of personalized federated learning depend on data heterogeneity from a minimax point of view.
arXiv Detail & Related papers (2021-03-02T17:58:20Z)
Robust and Differentially Private Mean Estimation [40.323756738056616]
Differential privacy has emerged as a standard requirement in a variety of applications ranging from the U.S. Census to data collected in commercial devices. An increasing number of such databases consist of data from multiple sources, not all of which can be trusted. This leaves existing private analyses vulnerable to attacks by an adversary who injects corrupted data.
arXiv Detail & Related papers (2021-02-18T05:02:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.