Related papers: Federated Majorize-Minimization: Beyond Parameter Aggregation

Federated Majorize-Minimization: Beyond Parameter Aggregation

URL: http://arxiv.org/abs/2507.17534v1
Date: Wed, 23 Jul 2025 14:13:19 GMT
Title: Federated Majorize-Minimization: Beyond Parameter Aggregation
Authors: Aymeric Dieuleveut, Gersende Fort, Mahmoud Hegazy, Hoi-To Wai,
Abstract summary: This paper proposes a unified approach for designing optimization algorithms that robustly scale to the federated learning setting.<n>Our framework encompasses (proximal) gradient-based algorithms for smooth objectives, the Expectation Maximization algorithm, and many problems seen as variational surrogate MM.<n>We show that our framework motivates a unifying algorithm called Approximation Surrogate MM (SSMM), which includes previous MM procedures as special instances.
Score: 27.52398073700742
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper proposes a unified approach for designing stochastic optimization algorithms that robustly scale to the federated learning setting. Our work studies a class of Majorize-Minimization (MM) problems, which possesses a linearly parameterized family of majorizing surrogate functions. This framework encompasses (proximal) gradient-based algorithms for (regularized) smooth objectives, the Expectation Maximization algorithm, and many problems seen as variational surrogate MM. We show that our framework motivates a unifying algorithm called Stochastic Approximation Stochastic Surrogate MM (\SSMM), which includes previous stochastic MM procedures as special instances. We then extend \SSMM\ to the federated setting, while taking into consideration common bottlenecks such as data heterogeneity, partial participation, and communication constraints; this yields \QSMM. The originality of \QSMM\ is to learn locally and then aggregate information characterizing the \textit{surrogate majorizing function}, contrary to classical algorithms which learn and aggregate the \textit{original parameter}. Finally, to showcase the flexibility of this methodology beyond our theoretical setting, we use it to design an algorithm for computing optimal transport maps in the federated setting.

Related papers

A Linearized Alternating Direction Multiplier Method for Federated Matrix Completion Problems [2.2217927229805032]
Matrix completion is fundamental for predicting missing data with a wide range of applications in personalized healthcare, e-commerce, recommendation systems, and social network analysis.<n>Traditional matrix completion approaches typically assume centralized data storage.<n>We propose textttFedMC-ADMM for solving federated matrix completion problems.
arXiv Detail & Related papers (2025-03-17T01:57:06Z)
Federated Smoothing ADMM for Localization [9.25126455172971]
Federated systems are characterized by distributed data, non-smoothity, and non-smoothness.<n>We propose a robust algorithm to tackle the scalability and outlier issues inherent in such environments.<n>To validate the reliability of the proposed algorithm, we show that it converges to a stationary point.<n> numerical simulations highlight its superior performance in convergence resilience compared to existing state-of-the-art localization methods.
arXiv Detail & Related papers (2025-03-12T16:01:34Z)
A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning. These problems are often formalized as Bi-Level optimizations (BLO) We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z)
A unified consensus-based parallel ADMM algorithm for high-dimensional regression with combined regularizations [3.280169909938912]
parallel alternating multipliers (ADMM) is widely recognized for its effectiveness in handling large-scale distributed datasets. The proposed algorithms serve to demonstrate the reliability, stability, and scalability of a financial example.
arXiv Detail & Related papers (2023-11-21T03:30:38Z)
Federated Conditional Stochastic Optimization [110.513884892319]
Conditional optimization has found in a wide range of machine learning tasks, such as in-variant learning tasks, AUPRC, andAML. This paper proposes algorithms for distributed federated learning.
arXiv Detail & Related papers (2023-10-04T01:47:37Z)
The Dynamics of Riemannian Robbins-Monro Algorithms [101.29301565229265]
We propose a family of Riemannian algorithms generalizing and extending the seminal approximation framework of Robbins and Monro. Compared to their Euclidean counterparts, Riemannian algorithms are much less understood due to lack of a global linear structure on the manifold. We provide a general template of almost sure convergence results that mirrors and extends the existing theory for Euclidean Robbins-Monro schemes.
arXiv Detail & Related papers (2022-06-14T12:30:11Z)
STORM+: Fully Adaptive SGD with Momentum for Nonconvex Optimization [74.1615979057429]
We investigate non-batch optimization problems where the objective is an expectation over smooth loss functions. Our work builds on the STORM algorithm, in conjunction with a novel approach to adaptively set the learning rate and momentum parameters.
arXiv Detail & Related papers (2021-11-01T15:43:36Z)
Memory-Based Optimization Methods for Model-Agnostic Meta-Learning and Personalized Federated Learning [56.17603785248675]
Model-agnostic meta-learning (MAML) has become a popular research area. Existing MAML algorithms rely on the episode' idea by sampling a few tasks and data points to update the meta-model at each iteration. This paper proposes memory-based algorithms for MAML that converge with vanishing error.
arXiv Detail & Related papers (2021-06-09T08:47:58Z)
A Dynamical Systems Approach for Convergence of the Bayesian EM Algorithm [59.99439951055238]
We show how (discrete-time) Lyapunov stability theory can serve as a powerful tool to aid, or even lead, in the analysis (and potential design) of optimization algorithms that are not necessarily gradient-based. The particular ML problem that this paper focuses on is that of parameter estimation in an incomplete-data Bayesian framework via the popular optimization algorithm known as maximum a posteriori expectation-maximization (MAP-EM) We show that fast convergence (linear or quadratic) is achieved, which could have been difficult to unveil without our adopted S&C approach.
arXiv Detail & Related papers (2020-06-23T01:34:18Z)
Optimizing generalization on the train set: a novel gradient-based framework to train parameters and hyperparameters simultaneously [0.0]
Generalization is a central problem in Machine Learning. We present a novel approach based on a new measure of risk that allows us to develop novel fully automatic procedures for generalization.
arXiv Detail & Related papers (2020-06-11T18:04:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.