Related papers: Convergence of Decentralized Stochastic Subgradient-based Methods for Nonsmooth Nonconvex functions

Convergence of Decentralized Stochastic Subgradient-based Methods for Nonsmooth Nonconvex functions

URL: http://arxiv.org/abs/2403.11565v3
Date: Fri, 09 May 2025 06:16:13 GMT
Title: Convergence of Decentralized Stochastic Subgradient-based Methods for Nonsmooth Nonconvex functions
Authors: Siyuan Zhang, Nachuan Xiao, Xin Liu,
Abstract summary: We propose a general framework that unifies various decentralized subgradient-based methods.<n>We prove convergence guarantees for some well-recognized decentralized subgradient-based methods.
Score: 10.278310909980576
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we focus on the decentralized stochastic subgradient-based methods in minimizing nonsmooth nonconvex functions without Clarke regularity, especially in the decentralized training of nonsmooth neural networks. We propose a general framework that unifies various decentralized subgradient-based methods, such as decentralized stochastic subgradient descent (DSGD), DSGD with gradient-tracking technique (DSGD-T), and DSGD with momentum (DSGD-M). To establish the convergence properties of our proposed framework, we relate the discrete iterates to the trajectories of a continuous-time differential inclusion, which is assumed to have a coercive Lyapunov function with a stable set $\mathcal{A}$. We prove the asymptotic convergence of the iterates to the stable set $\mathcal{A}$ with sufficiently small and diminishing step-sizes. These results provide first convergence guarantees for some well-recognized of decentralized stochastic subgradient-based methods without Clarke regularity of the objective function. Preliminary numerical experiments demonstrate that our proposed framework yields highly efficient decentralized stochastic subgradient-based methods with convergence guarantees in the training of nonsmooth neural networks.

Related papers

Learning Theory of Decentralized Robust Kernel-Based Learning Algorithm [1.3597551064547502]
We propose a new robust kernel-based learning algorithm within the framework of reproducing kernel Hilbert space (RKHS)<n>We show each local robust estimator generated from the decentralized algorithm can be utilized to approximate the regression function.<n>We provide rigorous selection rules for local sample size and show that, under properly selected step size and scaling parameter $sigma$, the decentralized robust algorithm can achieve optimal learning rates.
arXiv Detail & Related papers (2025-06-05T16:30:05Z)
Decentralized Inference for Spatial Data Using Low-Rank Models [4.168323530566095]
This paper presents a decentralized framework tailored for parameter inference in spatial low-rank models. A key obstacle arises from the spatial dependence among observations, which prevents the log-likelihood from being expressed as a summation. Our approach employs a block descent method integrated with multi-consensus and dynamic consensus averaging for effective parameter optimization.
arXiv Detail & Related papers (2025-02-01T04:17:01Z)
Decentralized Smoothing ADMM for Quantile Regression with Non-Convex Sparse Penalties [3.269165283595478]
In the rapidly evolving internet-of-things (IoT) ecosystem, effective data analysis techniques are crucial for handling distributed data generated by sensors. Addressing the limitations of existing methods, such as the sub-gradient consensus approach, which fails to distinguish between active and non-active coefficients.
arXiv Detail & Related papers (2024-08-02T15:00:04Z)
A Unified Theory of Stochastic Proximal Point Methods without Smoothness [52.30944052987393]
Proximal point methods have attracted considerable interest owing to their numerical stability and robustness against imperfect tuning. This paper presents a comprehensive analysis of a broad range of variations of the proximal point method (SPPM)
arXiv Detail & Related papers (2024-05-24T21:09:19Z)
Rethinking Clustered Federated Learning in NOMA Enhanced Wireless Networks [60.09912912343705]
This study explores the benefits of integrating the novel clustered federated learning (CFL) approach with non-independent and identically distributed (non-IID) datasets. A detailed theoretical analysis of the generalization gap that measures the degree of non-IID in the data distribution is presented. Solutions to address the challenges posed by non-IID conditions are proposed with the analysis of the properties.
arXiv Detail & Related papers (2024-03-05T17:49:09Z)
FastPart: Over-Parameterized Stochastic Gradient Descent for Sparse optimisation on Measures [1.9950682531209156]
This paper presents a novel algorithm that leverages Gradient Descent strategies in conjunction with Random Features to augment the scalability of Conic Particle Gradient Descent (CPGD) We provide rigorous proofs demonstrating the following key findings: (i) The total variation norms of the solution measures along the descent trajectory remain bounded, ensuring stability and preventing undesirable divergence; (ii) We establish a global convergence guarantee with a convergence rate of $mathcalO(log(K)/sqrtK)$ over $K$, showcasing the efficiency and effectiveness of our algorithm; (iii) Additionally, we analyze and establish
arXiv Detail & Related papers (2023-12-10T20:41:43Z)
Stability and Generalization of the Decentralized Stochastic Gradient Descent Ascent Algorithm [80.94861441583275]
We investigate the complexity of the generalization bound of the decentralized gradient descent (D-SGDA) algorithm. Our results analyze the impact of different top factors on the generalization of D-SGDA. We also balance it with the generalization to obtain the optimal convex-concave setting.
arXiv Detail & Related papers (2023-10-31T11:27:01Z)
A Neural Network-Based Enrichment of Reproducing Kernel Approximation for Modeling Brittle Fracture [0.0]
An improved version of the neural network-enhanced Reproducing Kernel Particle Method (NN-RKPM) is proposed for modeling brittle fracture. The effectiveness of the proposed method is demonstrated by a series of numerical examples involving damage propagation and branching.
arXiv Detail & Related papers (2023-07-04T21:52:09Z)
Stochastic Unrolled Federated Learning [85.6993263983062]
We introduce UnRolled Federated learning (SURF), a method that expands algorithm unrolling to federated learning. Our proposed method tackles two challenges of this expansion, namely the need to feed whole datasets to the unrolleds and the decentralized nature of federated learning.
arXiv Detail & Related papers (2023-05-24T17:26:22Z)
A Unified Momentum-based Paradigm of Decentralized SGD for Non-Convex Models and Heterogeneous Data [0.261072980439312]
We propose a unified paradigm called U.MP, D-MP and GT-D, which provides a convergence guarantee for non general objectives. In theory we provide the convergence analysis objectives two approaches for these non-MP algorithms.
arXiv Detail & Related papers (2023-03-01T02:13:22Z)
Stability and Generalization Analysis of Gradient Methods for Shallow Neural Networks [59.142826407441106]
We study the generalization behavior of shallow neural networks (SNNs) by leveraging the concept of algorithmic stability. We consider gradient descent (GD) and gradient descent (SGD) to train SNNs, for both of which we develop consistent excess bounds.
arXiv Detail & Related papers (2022-09-19T18:48:00Z)
On the Convergence of Stochastic Extragradient for Bilinear Games with Restarted Iteration Averaging [96.13485146617322]
We present an analysis of the ExtraGradient (SEG) method with constant step size, and present variations of the method that yield favorable convergence. We prove that when augmented with averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure.
arXiv Detail & Related papers (2021-06-30T17:51:36Z)
Stochastic optimization with momentum: convergence, fluctuations, and traps avoidance [0.0]
In this paper, a general optimization procedure is studied, unifying several variants of the gradient descent such as, among others, the heavy ball method, the Nesterov Accelerated Gradient (S-NAG), and the widely used Adam method. The avoidance is studied as a noisy discretization of a non-autonomous ordinary differential equation.
arXiv Detail & Related papers (2020-12-07T19:14:49Z)
An improved convergence analysis for decentralized online stochastic non-convex optimization [17.386715847732468]
In this paper, we show that a technique called GT-Loakjasiewics (GT-Loakjasiewics) satisfies the existing condition GT-Loakjasiewics (GT-Loakjasiewics) satisfies the current best convergence rates. The results are not only immediately applicable but also the currently known best convergence rates.
arXiv Detail & Related papers (2020-08-10T15:29:13Z)
Optimal Rates for Averaged Stochastic Gradient Descent under Neural Tangent Kernel Regime [50.510421854168065]
We show that the averaged gradient descent can achieve the minimax optimal convergence rate. We show that the target function specified by the NTK of a ReLU network can be learned at the optimal convergence rate.
arXiv Detail & Related papers (2020-06-22T14:31:37Z)
IDEAL: Inexact DEcentralized Accelerated Augmented Lagrangian Method [64.15649345392822]
We introduce a framework for designing primal methods under the decentralized optimization setting where local functions are smooth and strongly convex. Our approach consists of approximately solving a sequence of sub-problems induced by the accelerated augmented Lagrangian method. When coupled with accelerated gradient descent, our framework yields a novel primal algorithm whose convergence rate is optimal and matched by recently derived lower bounds.
arXiv Detail & Related papers (2020-06-11T18:49:06Z)
A Distributional Analysis of Sampling-Based Reinforcement Learning Algorithms [67.67377846416106]
We present a distributional approach to theoretical analyses of reinforcement learning algorithms for constant step-sizes. We show that value-based methods such as TD($lambda$) and $Q$-Learning have update rules which are contractive in the space of distributions of functions.
arXiv Detail & Related papers (2020-03-27T05:13:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.