An Accelerated Doubly Stochastic Gradient Method with Faster Explicit
Model Identification
- URL: http://arxiv.org/abs/2208.06058v1
- Date: Thu, 11 Aug 2022 22:27:22 GMT
- Title: An Accelerated Doubly Stochastic Gradient Method with Faster Explicit
Model Identification
- Authors: Runxue Bao, Bin Gu, Heng Huang
- Abstract summary: We propose a novel doubly accelerated gradient descent (ADSGD) method for sparsity regularized loss minimization problems.
We first prove that ADSGD can achieve a linear convergence rate and lower overall computational complexity.
- Score: 97.28167655721766
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sparsity regularized loss minimization problems play an important role in
various fields including machine learning, data mining, and modern statistics.
Proximal gradient descent method and coordinate descent method are the most
popular approaches to solving the minimization problem. Although existing
methods can achieve implicit model identification, aka support set
identification, in a finite number of iterations, these methods still suffer
from huge computational costs and memory burdens in high-dimensional scenarios.
The reason is that the support set identification in these methods is implicit
and thus cannot explicitly identify the low-complexity structure in practice,
namely, they cannot discard useless coefficients of the associated features to
achieve algorithmic acceleration via dimension reduction. To address this
challenge, we propose a novel accelerated doubly stochastic gradient descent
(ADSGD) method for sparsity regularized loss minimization problems, which can
reduce the number of block iterations by eliminating inactive coefficients
during the optimization process and eventually achieve faster explicit model
identification and improve the algorithm efficiency. Theoretically, we first
prove that ADSGD can achieve a linear convergence rate and lower overall
computational complexity. More importantly, we prove that ADSGD can achieve a
linear rate of explicit model identification. Numerically, experimental results
on benchmark datasets confirm the efficiency of our proposed method.
Related papers
- Dimension reduction via score ratio matching [0.9012198585960441]
We propose a framework, derived from score-matching, to extend gradient-based dimension reduction to problems where gradients are unavailable.
We show that our approach outperforms standard score-matching for problems with low-dimensional structure.
arXiv Detail & Related papers (2024-10-25T22:21:03Z) - Robust Analysis of Multi-Task Learning Efficiency: New Benchmarks on Light-Weighed Backbones and Effective Measurement of Multi-Task Learning Challenges by Feature Disentanglement [69.51496713076253]
In this paper, we focus on the aforementioned efficiency aspects of existing MTL methods.
We first carry out large-scale experiments of the methods with smaller backbones and on a the MetaGraspNet dataset as a new test ground.
We also propose Feature Disentanglement measure as a novel and efficient identifier of the challenges in MTL.
arXiv Detail & Related papers (2024-02-05T22:15:55Z) - Scalable Bayesian Meta-Learning through Generalized Implicit Gradients [64.21628447579772]
Implicit Bayesian meta-learning (iBaML) method broadens the scope of learnable priors, but also quantifies the associated uncertainty.
Analytical error bounds are established to demonstrate the precision and efficiency of the generalized implicit gradient over the explicit one.
arXiv Detail & Related papers (2023-03-31T02:10:30Z) - Distributed Dynamic Safe Screening Algorithms for Sparse Regularization [73.85961005970222]
We propose a new distributed dynamic safe screening (DDSS) method for sparsity regularized models and apply it on shared-memory and distributed-memory architecture respectively.
We prove that the proposed method achieves the linear convergence rate with lower overall complexity and can eliminate almost all the inactive features in a finite number of iterations almost surely.
arXiv Detail & Related papers (2022-04-23T02:45:55Z) - Gradient-Based Learning of Discrete Structured Measurement Operators for
Signal Recovery [16.740247586153085]
We show how to leverage gradient-based learning to solve discrete optimization problems.
Our approach is formalized by GLODISMO (Gradient-based Learning of DIscrete Structured Measurement Operators)
We empirically demonstrate the performance and flexibility of GLODISMO in several signal recovery applications.
arXiv Detail & Related papers (2022-02-07T18:27:08Z) - Accelerated nonlinear primal-dual hybrid gradient algorithms with
applications to machine learning [0.0]
primal-dual hybrid gradient (PDHG) is a first-order method that splits convex optimization problems with saddle-point structure into smaller subproblems.
PDHG requires stepsize parameters fine-tuned for the problem at hand.
We introduce accelerated nonlinear variants of the PDHG algorithm that can achieve, for a broad class of optimization problems relevant to machine learning.
arXiv Detail & Related papers (2021-09-24T22:37:10Z) - The Neural Network shifted-Proper Orthogonal Decomposition: a Machine
Learning Approach for Non-linear Reduction of Hyperbolic Equations [0.0]
In this work we approach the problem of automatically detecting the correct pre-processing transformation in a statistical learning framework.
The purely data-driven method allowed us to generalise the existing approaches of linear subspace manipulation to non-linear hyperbolic problems with unknown advection fields.
The proposed algorithm has been validated against simple test cases to benchmark its performances and later successfully applied to a multiphase simulation.
arXiv Detail & Related papers (2021-08-14T15:13:35Z) - Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem.
We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent.
Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z) - Effective Dimension Adaptive Sketching Methods for Faster Regularized
Least-Squares Optimization [56.05635751529922]
We propose a new randomized algorithm for solving L2-regularized least-squares problems based on sketching.
We consider two of the most popular random embeddings, namely, Gaussian embeddings and the Subsampled Randomized Hadamard Transform (SRHT)
arXiv Detail & Related papers (2020-06-10T15:00:09Z) - Convergence and sample complexity of gradient methods for the model-free
linear quadratic regulator problem [27.09339991866556]
We show that ODE searches for optimal control for an unknown computation system by directly searching over the corresponding space of controllers.
We take a step towards demystifying the performance and efficiency of such methods by focusing on the gradient-flow dynamics set of stabilizing feedback gains and a similar result holds for the forward disctization of the ODE.
arXiv Detail & Related papers (2019-12-26T16:56:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.