Stability and Generalization for Randomized Coordinate Descent
- URL: http://arxiv.org/abs/2108.07414v1
- Date: Tue, 17 Aug 2021 02:52:50 GMT
- Title: Stability and Generalization for Randomized Coordinate Descent
- Authors: Puyu Wang, Liang Wu, Yunwen Lei
- Abstract summary: There is no work studying how the models trained by RCD would generalize to test examples.
In this paper, we initialize the generalization analysis of RCD by leveraging the powerful tool of algorithmic stability.
Our analysis shows that RCD enjoys better stability as compared to gradient descent.
- Score: 19.687456295228156
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Randomized coordinate descent (RCD) is a popular optimization algorithm with
wide applications in solving various machine learning problems, which motivates
a lot of theoretical analysis on its convergence behavior. As a comparison,
there is no work studying how the models trained by RCD would generalize to
test examples. In this paper, we initialize the generalization analysis of RCD
by leveraging the powerful tool of algorithmic stability. We establish argument
stability bounds of RCD for both convex and strongly convex objectives, from
which we develop optimal generalization bounds by showing how to early-stop the
algorithm to tradeoff the estimation and optimization. Our analysis shows that
RCD enjoys better stability as compared to stochastic gradient descent.
Related papers
- A Primal-Dual Algorithm for Faster Distributionally Robust Optimization [12.311794669976047]
We present Drago, a primal-dual algorithm that achieves a state-of-the-art linear convergence rate on strongly convex-strongly concave DRO problems.
We support our theoretical results with numerical benchmarks in classification and regression.
arXiv Detail & Related papers (2024-03-16T02:06:14Z) - ALEXR: An Optimal Single-Loop Algorithm for Convex Finite-Sum Coupled Compositional Stochastic Optimization [53.14532968909759]
We introduce an efficient single-loop primal-dual block-coordinate algorithm, dubbed ALEXR.
We establish the convergence rates of ALEXR in both convex and strongly convex cases under smoothness and non-smoothness conditions.
We present lower complexity bounds to demonstrate that the convergence rates of ALEXR are optimal among first-order block-coordinate algorithms for the considered class of cFCCO problems.
arXiv Detail & Related papers (2023-12-04T19:00:07Z) - Stability and Generalization of the Decentralized Stochastic Gradient
Descent Ascent Algorithm [80.94861441583275]
We investigate the complexity of the generalization bound of the decentralized gradient descent (D-SGDA) algorithm.
Our results analyze the impact of different top factors on the generalization of D-SGDA.
We also balance it with the generalization to obtain the optimal convex-concave setting.
arXiv Detail & Related papers (2023-10-31T11:27:01Z) - Stability and Generalization of Stochastic Compositional Gradient
Descent Algorithms [61.59448949684493]
We provide the stability and generalization analysis of compositional descent algorithms built from training examples.
We establish the uniform stability results for two popular compositional gradient descent algorithms, namely SCGD and SCSC.
We derive-independent excess risk bounds for SCGD and SCSC by trade-offing their stability results and optimization errors.
arXiv Detail & Related papers (2023-07-07T02:40:09Z) - Exploring the Algorithm-Dependent Generalization of AUPRC Optimization
with List Stability [107.65337427333064]
optimization of the Area Under the Precision-Recall Curve (AUPRC) is a crucial problem for machine learning.
In this work, we present the first trial in the single-dependent generalization of AUPRC optimization.
Experiments on three image retrieval datasets on speak to the effectiveness and soundness of our framework.
arXiv Detail & Related papers (2022-09-27T09:06:37Z) - Gradient Descent Averaging and Primal-dual Averaging for Strongly Convex
Optimization [15.731908248435348]
We develop gradient descent averaging and primal-dual averaging algorithms for strongly convex cases.
We prove that primal-dual averaging yields the optimal convergence rate in terms of output averaging, while SC-PDA derives the optimal individual convergence.
Several experiments on SVMs and deep learning models validate the correctness of theoretical analysis and effectiveness of algorithms.
arXiv Detail & Related papers (2020-12-29T01:40:30Z) - Bilevel Optimization: Convergence Analysis and Enhanced Design [63.64636047748605]
Bilevel optimization is a tool for many machine learning problems.
We propose a novel stoc-efficientgradient estimator named stoc-BiO.
arXiv Detail & Related papers (2020-10-15T18:09:48Z) - Kernel Distributionally Robust Optimization [17.909696462645023]
We propose kernel distributionally robust optimization ( Kernel DRO) using insights from the robust optimization theory and functional analysis.
Our method uses kernel kernel (RKHS) to construct a wide range of convex ambiguity, which can be generalized to sets based on probability metrics and finite-order moment sets.
Using universal RKHSs, the theorem applies to a broad class of loss functions, lifting common limitations such as losses and knowledge of the Lipschitz constant.
arXiv Detail & Related papers (2020-06-12T07:46:59Z) - Convergence of adaptive algorithms for weakly convex constrained
optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope.
Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.