Related papers: AltGDmin: Alternating GD and Minimization for Partly-Decoupled (Federated) Optimization

AltGDmin: Alternating GD and Minimization for Partly-Decoupled (Federated) Optimization

URL: http://arxiv.org/abs/2504.14741v1
Date: Sun, 20 Apr 2025 21:07:59 GMT
Title: AltGDmin: Alternating GD and Minimization for Partly-Decoupled (Federated) Optimization
Authors: Namrata Vaswani,
Abstract summary: This article describes a novel optimization solution framework, called gradient alternating descent (GD) and minimization (AltGDmin)<n>AltGDmin is often a faster solution than AltMin for any problem for which (i) the minimization over one set of variables, Zb, is much quicker than that over the other set, Za, and (ii) the cost function is differentiable w.r.t Za.
Score: 20.370944501500762
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This article describes a novel optimization solution framework, called alternating gradient descent (GD) and minimization (AltGDmin), that is useful for many problems for which alternating minimization (AltMin) is a popular solution. AltMin is a special case of the block coordinate descent algorithm that is useful for problems in which minimization w.r.t one subset of variables keeping the other fixed is closed form or otherwise reliably solved. Denote the two blocks/subsets of the optimization variables Z by Za, Zb, i.e., Z = {Za, Zb}. AltGDmin is often a faster solution than AltMin for any problem for which (i) the minimization over one set of variables, Zb, is much quicker than that over the other set, Za; and (ii) the cost function is differentiable w.r.t. Za. Often, the reason for one minimization to be quicker is that the problem is ``decoupled" for Zb and each of the decoupled problems is quick to solve. This decoupling is also what makes AltGDmin communication-efficient for federated settings. Important examples where this assumption holds include (a) low rank column-wise compressive sensing (LRCS), low rank matrix completion (LRMC), (b) their outlier-corrupted extensions such as robust PCA, robust LRCS and robust LRMC; (c) phase retrieval and its sparse and low-rank model based extensions; (d) tensor extensions of many of these problems such as tensor LRCS and tensor completion; and (e) many partly discrete problems where GD does not apply -- such as clustering, unlabeled sensing, and mixed linear regression. LRCS finds important applications in multi-task representation learning and few shot learning, federated sketching, and accelerated dynamic MRI. LRMC and robust PCA find important applications in recommender systems, computer vision and video analytics.

Related papers

Syzygy of Thoughts: Improving LLM CoT with the Minimal Free Resolution [59.39066657300045]
Chain-of-Thought (CoT) prompting enhances the reasoning of large language models (LLMs) by decomposing problems into sequential steps.<n>We propose Syzygy of Thoughts (SoT)-a novel framework that extends CoT by introducing auxiliary, interrelated reasoning paths.<n>SoT captures deeper logical dependencies, enabling more robust and structured problem-solving.
arXiv Detail & Related papers (2025-04-13T13:35:41Z)
Near-Optimal Online Learning for Multi-Agent Submodular Coordination: Tight Approximation and Communication Efficiency [52.60557300927007]
We present a $textbfMA-OSMA$ algorithm to transfer the discrete submodular problem into a continuous optimization.<n>We also introduce a projection-free $textbfMA-OSEA$ algorithm, which effectively utilizes the KL divergence by mixing a uniform distribution.<n>Our algorithms significantly improve the $(frac11+c)$-approximation provided by the state-of-the-art OSG algorithm.
arXiv Detail & Related papers (2025-02-07T15:57:56Z)
Obtaining Lower Query Complexities through Lightweight Zeroth-Order Proximal Gradient Algorithms [65.42376001308064]
We propose two variance reduced ZO estimators for complex gradient problems. We improve the state-of-the-art function complexities from $mathcalOleft(minfracdn1/2epsilon2, fracdepsilon3right)$ to $tildecalOleft(fracdepsilon2right)$.
arXiv Detail & Related papers (2024-10-03T15:04:01Z)
Alternating Minimization Schemes for Computing Rate-Distortion-Perception Functions with $f$-Divergence Perception Constraints [10.564071872770146]
We study the computation of the rate-distortion-perception function (RDPF) for discrete memoryless sources. We characterize the optimal parametric solutions. We provide sufficient conditions on the distortion and the perception constraints.
arXiv Detail & Related papers (2024-08-27T12:50:12Z)
Efficient Federated Low Rank Matrix Completion [18.471262688125645]
We develop and analyze a solution called Alternating GD and Minimization (AltGDmin) for solving the low rank matrix completion (LRMC) problem. Our theoretical guarantees imply that AltGDmin is the most communication-efficient solution in a federated setting. We show how our lemmas can be used to provide an improved sample complexity guarantee for AltMin.
arXiv Detail & Related papers (2024-05-10T16:12:35Z)
Convergence of ease-controlled Random Reshuffling gradient Algorithms under Lipschitz smoothness [0.0]
We consider the average of a very large number of smooth possibly non-size functions, and we use two widely minibatch frameworks to tackle this problem. We define ease-controlled modifications of IG/RR schemes, which require a light additional computational effort. We prove our implementation with both a full batch gradient (i.e. L-BFGS) and an implementation of IG/RR methods, proving that algorithms require a similar computational effort.
arXiv Detail & Related papers (2022-12-04T15:26:36Z)
Adaptive Federated Minimax Optimization with Lower Complexities [82.51223883622552]
We propose an efficient adaptive minimax optimization algorithm (i.e., AdaFGDA) to solve these minimax problems. It builds our momentum-based reduced and localSGD techniques, and it flexibly incorporate various adaptive learning rates.
arXiv Detail & Related papers (2022-11-14T12:32:18Z)
Multi-block-Single-probe Variance Reduced Estimator for Coupled Compositional Optimization [49.58290066287418]
We propose a novel method named Multi-block-probe Variance Reduced (MSVR) to alleviate the complexity of compositional problems. Our results improve upon prior ones in several aspects, including the order of sample complexities and dependence on strongity.
arXiv Detail & Related papers (2022-07-18T12:03:26Z)
Smooth over-parameterized solvers for non-smooth structured optimization [3.756550107432323]
Non-smoothness encodes structural constraints on the solutions, such as sparsity, group sparsity, low-rank edges and sharp edges. We operate a non-weighted but smooth overparametrization of the underlying nonsmooth optimization problems. Our main contribution is to apply the Variable Projection (VarPro) which defines a new formulation by explicitly minimizing over part of the variables.
arXiv Detail & Related papers (2022-05-03T09:23:07Z)
Parallel Quasi-concave set optimization: A new frontier that scales without needing submodularity [14.93584434176082]
Class of quasi-concave set functions induced as a dual class to monotone linkage functions. We show a potential for widespread applications via an example of diverse feature subset selection with exact global maxi-min guarantees.
arXiv Detail & Related papers (2021-08-19T15:50:41Z)
Smooth Bilevel Programming for Sparse Regularization [5.177947445379688]
Iteratively reweighted least square (IRLS) is a popular approach to solve sparsity-enforcing regression problems in machine learning. We show how a surprisingly reparametrization of IRLS, coupled with a bilevel scheme, achieves topranging of sparsity.
arXiv Detail & Related papers (2021-06-02T19:18:22Z)
Solving weakly supervised regression problem using low-rank manifold regularization [77.34726150561087]
We solve a weakly supervised regression problem. Under "weakly" we understand that for some training points the labels are known, for some unknown, and for others uncertain due to the presence of random noise or other reasons such as lack of resources. In the numerical section, we applied the suggested method to artificial and real datasets using Monte-Carlo modeling.
arXiv Detail & Related papers (2021-04-13T23:21:01Z)
Gradient Free Minimax Optimization: Variance Reduction and Faster Convergence [120.9336529957224]
In this paper, we denote the non-strongly setting on the magnitude of a gradient-free minimax optimization problem. We show that a novel zeroth-order variance reduced descent algorithm achieves the best known query complexity.
arXiv Detail & Related papers (2020-06-16T17:55:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.