Related papers: One-Shot Federated Ridge Regression: Exact Recovery via Sufficient Statistic Aggregation

One-Shot Federated Ridge Regression: Exact Recovery via Sufficient Statistic Aggregation

URL: http://arxiv.org/abs/2601.08216v1
Date: Tue, 13 Jan 2026 04:47:22 GMT
Title: One-Shot Federated Ridge Regression: Exact Recovery via Sufficient Statistic Aggregation
Authors: Zahir Alsulaimawi,
Abstract summary: Federated ridge regression is a distributed equilibrium problem where each client computes local sufficient statistics and transmits them once.<n>We establish differential privacy guarantees where noise is injected once per client, eliminating the composition penalty that degrades privacy in multi-round protocols.<n>Experiments on synthetic heterogeneous regression demonstrate that one-shot fusion matches FedAvg accuracy while requiring up to $38times$ less communication.
Score: 0.7106986689736825
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Federated learning protocols require repeated synchronization between clients and a central server, with convergence rates depending on learning rates, data heterogeneity, and client sampling. This paper asks whether iterative communication is necessary for distributed linear regression. We show it is not. We formulate federated ridge regression as a distributed equilibrium problem where each client computes local sufficient statistics -- the Gram matrix and moment vector -- and transmits them once. The server reconstructs the global solution through a single matrix inversion. We prove exact recovery: under a coverage condition on client feature matrices, one-shot aggregation yields the centralized ridge solution, not an approximation. For heterogeneous distributions violating coverage, we derive non-asymptotic error bounds depending on spectral properties of the aggregated Gram matrix. Communication reduces from $\mathcal{O}(Rd)$ in iterative methods to $\mathcal{O}(d^2)$ total; for high-dimensional settings, we propose and experimentally validate random projection techniques reducing this to $\mathcal{O}(m^2)$ where $m \ll d$. We establish differential privacy guarantees where noise is injected once per client, eliminating the composition penalty that degrades privacy in multi-round protocols. We further address practical considerations including client dropout robustness, federated cross-validation for hyperparameter selection, and comparison with gradient-based alternatives. Comprehensive experiments on synthetic heterogeneous regression demonstrate that one-shot fusion matches FedAvg accuracy while requiring up to $38\times$ less communication. The framework applies to kernel methods and random feature models but not to general nonlinear architectures.

Related papers

Optimal Unconstrained Self-Distillation in Ridge Regression: Strict Improvements, Precise Asymptotics, and One-Shot Tuning [61.07540493350384]
Self-distillation (SD) is the process of retraining a student on a mixture of ground-truth and the teacher's own predictions.<n>We show that for any prediction risk, the optimally mixed student improves upon the ridge teacher for every regularization level.<n>We propose a consistent one-shot tuning method to estimate $star$ without grid search, sample splitting, or refitting.
arXiv Detail & Related papers (2026-02-19T17:21:15Z)
Beyond Covariance Matrix: The Statistical Complexity of Private Linear Regression [66.93988594607842]
Under privacy constraints, the complexity of private linear regression is emphnot captured by the usual covariance matrix.<n>We introduce an Information-Weighted Regression method that attains the optimal rates.<n> Notably, our results demonstrate that joint privacy comes at almost no additional cost.
arXiv Detail & Related papers (2025-02-18T18:35:24Z)
Multivariate root-n-consistent smoothing parameter free matching estimators and estimators of inverse density weighted expectations [51.000851088730684]
We develop novel modifications of nearest-neighbor and matching estimators which converge at the parametric $sqrt n $-rate.<n>We stress that our estimators do not involve nonparametric function estimators and in particular do not rely on sample-size dependent parameters smoothing.
arXiv Detail & Related papers (2024-07-11T13:28:34Z)
Relaxed Quantile Regression: Prediction Intervals for Asymmetric Noise [51.87307904567702]
Quantile regression is a leading approach for obtaining such intervals via the empirical estimation of quantiles in the distribution of outputs.<n>We propose Relaxed Quantile Regression (RQR), a direct alternative to quantile regression based interval construction that removes this arbitrary constraint.<n>We demonstrate that this added flexibility results in intervals with an improvement in desirable qualities.
arXiv Detail & Related papers (2024-06-05T13:36:38Z)
Generalized Differentiable RANSAC [95.95627475224231]
$nabla$-RANSAC is a differentiable RANSAC that allows learning the entire randomized robust estimation pipeline. $nabla$-RANSAC is superior to the state-of-the-art in terms of accuracy while running at a similar speed to its less accurate alternatives.
arXiv Detail & Related papers (2022-12-26T15:13:13Z)
Heterogeneous Federated Learning on a Graph [9.135254524746847]
Federated learning, where algorithms are trained across multiple decentralized devices without sharing local data, is increasingly popular in machine learning practice. In this work, we consider parameter estimation in federated learning iteration with data distribution and communication heterogeneity, as well as limited computational capacity of local devices. We highlight, our algorithm transmits only parameters along edges of $G$ at convergence rate $O(T-1log T)$ where $T$ denotes the number of iterations.
arXiv Detail & Related papers (2022-09-19T03:18:10Z)
Recovery Guarantees for Distributed-OMP [8.393317912360564]
We study distributed schemes for high-dimensional sparse linear regression. We prove that distributed-OMP schemes recover the support of the regression vector with communication per machine linear in its sparsity and logarithmic in the dimension. Our simulations show that distributed-OMP schemes are competitive with more computationally intensive methods, and in some cases even outperform them.
arXiv Detail & Related papers (2022-09-15T11:43:33Z)
Rate-Distortion Theoretic Bounds on Generalization Error for Distributed Learning [9.00236182523638]
In this paper, we use tools from rate-distortion theory to establish new upper bounds on the generalization error of statistical distributed learning algorithms. The bounds depend on the compressibility of each client's algorithm while keeping other clients' algorithms un-compressed.
arXiv Detail & Related papers (2022-06-06T13:21:52Z)
Settling the Sample Complexity of Model-Based Offline Reinforcement Learning [50.5790774201146]
offline reinforcement learning (RL) learns using pre-collected data without further exploration. Prior algorithms or analyses either suffer from suboptimal sample complexities or incur high burn-in cost to reach sample optimality. We demonstrate that the model-based (or "plug-in") approach achieves minimax-optimal sample complexity without burn-in cost.
arXiv Detail & Related papers (2022-04-11T17:26:19Z)
An Improved Analysis of Gradient Tracking for Decentralized Machine Learning [34.144764431505486]
We consider decentralized machine learning over a network where the training data is distributed across $n$ agents. The agent's common goal is to find a model that minimizes the average of all local loss functions. We improve the dependency on $p$ from $mathcalO(p-1)$ to $mathcalO(p-1)$ in the noiseless case.
arXiv Detail & Related papers (2022-02-08T12:58:14Z)
Distributed Sparse Regression via Penalization [5.990069843501885]
We study linear regression over a network of agents, modeled as an undirected graph (with no centralized node) The estimation problem is formulated as the minimization of the sum of the local LASSO loss functions plus a quadratic penalty of the consensus constraint. We show that the proximal-gradient algorithm applied to the penalized problem converges linearly up to a tolerance of the order of the centralized statistical error.
arXiv Detail & Related papers (2021-11-12T01:51:50Z)
Online nonparametric regression with Sobolev kernels [99.12817345416846]
We derive the regret upper bounds on the classes of Sobolev spaces $W_pbeta(mathcalX)$, $pgeq 2, beta>fracdp$. The upper bounds are supported by the minimax regret analysis, which reveals that in the cases $beta> fracd2$ or $p=infty$ these rates are (essentially) optimal.
arXiv Detail & Related papers (2021-02-06T15:05:14Z)
Nonlinear Distribution Regression for Remote Sensing Applications [6.664736150040092]
In many remote sensing applications one wants to estimate variables or parameters of interest from observations. Standard algorithms such as neural networks, random forests or Gaussian processes are readily available to relate to the two. This paper introduces a nonlinear (kernel-based) method for distribution regression that solves the previous problems without making any assumption on the statistics of the grouped data.
arXiv Detail & Related papers (2020-12-07T22:04:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.