Faster federated optimization under second-order similarity
- URL: http://arxiv.org/abs/2209.02257v2
- Date: Tue, 23 May 2023 02:04:17 GMT
- Title: Faster federated optimization under second-order similarity
- Authors: Ahmed Khaled and Chi Jin
- Abstract summary: Federated learning (FL) is a subfield of machine learning where multiple clients try to collaboratively learn a model over a network.
We consider finite-sum federated optimization under a second-order function similarity condition and strong convexity.
We propose two new algorithms: SVRP and Catalyzed SVRP.
- Score: 36.66443669044588
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Federated learning (FL) is a subfield of machine learning where multiple
clients try to collaboratively learn a model over a network under communication
constraints. We consider finite-sum federated optimization under a second-order
function similarity condition and strong convexity, and propose two new
algorithms: SVRP and Catalyzed SVRP. This second-order similarity condition has
grown popular recently, and is satisfied in many applications including
distributed statistical learning and differentially private empirical risk
minimization. The first algorithm, SVRP, combines approximate stochastic
proximal point evaluations, client sampling, and variance reduction. We show
that SVRP is communication efficient and achieves superior performance to many
existing algorithms when function similarity is high enough. Our second
algorithm, Catalyzed SVRP, is a Catalyst-accelerated variant of SVRP that
achieves even better performance and uniformly improves upon existing
algorithms for federated optimization under second-order similarity and strong
convexity. In the course of analyzing these algorithms, we provide a new
analysis of the Stochastic Proximal Point Method (SPPM) that might be of
independent interest. Our analysis of SPPM is simple, allows for approximate
proximal point evaluations, does not require any smoothness assumptions, and
shows a clear benefit in communication complexity over ordinary distributed
stochastic gradient descent.
Related papers
- Closing the Approximation Gap of Partial AUC Optimization: A Tale of Two Formulations [121.39938773554523]
The Area Under the ROC Curve (AUC) is a pivotal evaluation metric in real-world scenarios with both class imbalance and decision constraints.<n>We present two simple instance-wise minimax reformulations to close the approximation gap of PAUC optimization.<n>The resulting algorithms enjoy a linear per-iteration computational complexity w.r.t. the sample size and a convergence rate of $O(-2/3)$ for typical one-way and two-way PAUCs.
arXiv Detail & Related papers (2025-12-01T02:52:33Z) - Stochastic Primal-Dual Double Block-Coordinate for Two-way Partial AUC Maximization [56.805574957824135]
Two-way partial AUCAUC is a critical performance metric for binary classification with imbalanced data.<n>Existing algorithms for TPAUC optimization remain under-explored.<n>We introduce two innovative double-coordinate block-coordinate algorithms for TPAUC optimization.
arXiv Detail & Related papers (2025-05-28T03:55:05Z) - FUSE: First-Order and Second-Order Unified SynthEsis in Stochastic Optimization [9.909119107223265]
First-order and second-order methods are in entirely different situations.
This paper presents a novel method that leverages both the first-order and second-order methods in a unified algorithmic framework.
FUSE-PV stands as a simple yet efficient optimization method involving a switch-over between first and second orders.
arXiv Detail & Related papers (2025-03-06T08:30:18Z) - A Learned Proximal Alternating Minimization Algorithm and Its Induced Network for a Class of Two-block Nonconvex and Nonsmooth Optimization [4.975853671529418]
This work proposes a general learned alternating minimization algorithm, LPAM, for solving learnable two-block nonsmooth problems.
The proposed LPAM-net is parameter-efficient and has favourable performance in comparison with some state-of-the-art methods.
arXiv Detail & Related papers (2024-11-10T02:02:32Z) - A Stochastic Approximation Approach for Efficient Decentralized Optimization on Random Networks [21.66341372216097]
A challenging problem in decentralized optimization is to develop algorithms with fast convergence on random time topologies under unreliable bandwidth-constrained communication network.<n>This paper introduces a novel approximation approach with a Fully Primal Dual Algorithm (FSPDA) framework.<n> Numerical experiments show the benefits of the FSPDA algorithms.
arXiv Detail & Related papers (2024-10-24T14:26:58Z) - Fast Two-Time-Scale Stochastic Gradient Method with Applications in Reinforcement Learning [5.325297567945828]
We propose a new method for two-time-scale optimization that achieves significantly faster convergence than the prior arts.
We characterize the proposed algorithm under various conditions and show how it specializes on online sample-based methods.
arXiv Detail & Related papers (2024-05-15T19:03:08Z) - Beyond Single-Model Views for Deep Learning: Optimization versus
Generalizability of Stochastic Optimization Algorithms [13.134564730161983]
This paper adopts a novel approach to deep learning optimization, focusing on gradient descent (SGD) and its variants.
We show that SGD and its variants demonstrate performance on par with flat-minimas like SAM, albeit with half the gradient evaluations.
Our study uncovers several key findings regarding the relationship between training loss and hold-out accuracy, as well as the comparable performance of SGD and noise-enabled variants.
arXiv Detail & Related papers (2024-03-01T14:55:22Z) - Federated Conditional Stochastic Optimization [110.513884892319]
Conditional optimization has found in a wide range of machine learning tasks, such as in-variant learning tasks, AUPRC, andAML.
This paper proposes algorithms for distributed federated learning.
arXiv Detail & Related papers (2023-10-04T01:47:37Z) - Adaptive Federated Minimax Optimization with Lower Complexities [82.51223883622552]
We propose an efficient adaptive minimax optimization algorithm (i.e., AdaFGDA) to solve these minimax problems.
It builds our momentum-based reduced and localSGD techniques, and it flexibly incorporate various adaptive learning rates.
arXiv Detail & Related papers (2022-11-14T12:32:18Z) - Faster Adaptive Momentum-Based Federated Methods for Distributed
Composition Optimization [14.579475552088692]
We propose a class of faster federated composition optimization algorithms (i.e. MFCGD and AdaMFCGD) to solve the non distributed composition problems.
In particular, our adaptive algorithm (i.e., AdaMFCGD) uses a unified adaptive matrix to flexibly incorporate various adaptive learning rates.
arXiv Detail & Related papers (2022-11-03T15:17:04Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z) - Fast Objective & Duality Gap Convergence for Non-Convex Strongly-Concave
Min-Max Problems with PL Condition [52.08417569774822]
This paper focuses on methods for solving smooth non-concave min-max problems, which have received increasing attention due to deep learning (e.g., deep AUC)
arXiv Detail & Related papers (2020-06-12T00:32:21Z) - Second-Order Guarantees in Centralized, Federated and Decentralized
Nonconvex Optimization [64.26238893241322]
Simple algorithms have been shown to lead to good empirical results in many contexts.
Several works have pursued rigorous analytical justification for studying non optimization problems.
A key insight in these analyses is that perturbations play a critical role in allowing local descent algorithms.
arXiv Detail & Related papers (2020-03-31T16:54:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.