Related papers: Hybrid Coordinate Descent for Efficient Neural Network Learning Using Line Search and Gradient Descent

Hybrid Coordinate Descent for Efficient Neural Network Learning Using Line Search and Gradient Descent

URL: http://arxiv.org/abs/2408.01374v1
Date: Fri, 2 Aug 2024 16:29:54 GMT
Title: Hybrid Coordinate Descent for Efficient Neural Network Learning Using Line Search and Gradient Descent
Authors: Yen-Che Hsiao, Abhishek Dutta,
Abstract summary: This paper presents a novel coordinate descent algorithm for a squared error loss function. Each parameter undergoes updates determined by either the line search or gradient method. Its parallelizability facilitates computational time reduction.
Score: 3.8936716676293917
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper presents a novel coordinate descent algorithm leveraging a combination of one-directional line search and gradient information for parameter updates for a squared error loss function. Each parameter undergoes updates determined by either the line search or gradient method, contingent upon whether the modulus of the gradient of the loss with respect to that parameter surpasses a predefined threshold. Notably, a larger threshold value enhances algorithmic efficiency. Despite the potentially slower nature of the line search method relative to gradient descent, its parallelizability facilitates computational time reduction. Experimental validation conducted on a 2-layer Rectified Linear Unit network with synthetic data elucidates the impact of hyperparameters on convergence rates and computational efficiency.

Related papers

Correlations Are Ruining Your Gradient Descent [1.2432046687586285]
Natural gradient descent illuminates how gradient vectors, pointing at directions of steepest descent, can be improved by considering the local curvature of loss landscapes. We show that correlations in the data at any linear transformation, including node responses at every layer of a neural network, cause a non-orthonormal relationship between the model's parameters. We describe a range of methods which have been proposed for decorrelation and whitening of node output, and expand on these to provide a novel method specifically useful for distributed computing and computational neuroscience.
arXiv Detail & Related papers (2024-07-15T14:59:43Z)
Adaptive Federated Learning Over the Air [108.62635460744109]
We propose a federated version of adaptive gradient methods, particularly AdaGrad and Adam, within the framework of over-the-air model training. Our analysis shows that the AdaGrad-based training algorithm converges to a stationary point at the rate of $mathcalO( ln(T) / T 1 - frac1alpha ).
arXiv Detail & Related papers (2024-03-11T09:10:37Z)
On Learning Gaussian Multi-index Models with Gradient Flow [57.170617397894404]
We study gradient flow on the multi-index regression problem for high-dimensional Gaussian data. We consider a two-timescale algorithm, whereby the low-dimensional link function is learnt with a non-parametric model infinitely faster than the subspace parametrizing the low-rank projection.
arXiv Detail & Related papers (2023-10-30T17:55:28Z)
Stochastic Marginal Likelihood Gradients using Neural Tangent Kernels [78.6096486885658]
We introduce lower bounds to the linearized Laplace approximation of the marginal likelihood. These bounds are amenable togradient-based optimization and allow to trade off estimation accuracy against computational complexity.
arXiv Detail & Related papers (2023-06-06T19:02:57Z)
An Accelerated Doubly Stochastic Gradient Method with Faster Explicit Model Identification [97.28167655721766]
We propose a novel doubly accelerated gradient descent (ADSGD) method for sparsity regularized loss minimization problems. We first prove that ADSGD can achieve a linear convergence rate and lower overall computational complexity.
arXiv Detail & Related papers (2022-08-11T22:27:22Z)
AdaLoss: A computationally-efficient and provably convergent adaptive gradient method [7.856998585396422]
We propose a computationally-friendly learning schedule, "AnomidaLoss", which uses the information of the loss function to adjust rate numerically. We verify the scope of the numerical experiments by applications in LSTM models for text and control problems.
arXiv Detail & Related papers (2021-09-17T01:45:25Z)
Solving Sparse Linear Inverse Problems in Communication Systems: A Deep Learning Approach With Adaptive Depth [51.40441097625201]
We propose an end-to-end trainable deep learning architecture for sparse signal recovery problems. The proposed method learns how many layers to execute to emit an output, and the network depth is dynamically adjusted for each task in the inference phase.
arXiv Detail & Related papers (2020-10-29T06:32:53Z)
GTAdam: Gradient Tracking with Adaptive Momentum for Distributed Online Optimization [4.103281325880475]
This paper deals with a network of computing agents aiming to solve an online optimization problem in a distributed fashion, by means of local computation and communication, without any central coordinator. We propose the gradient tracking with adaptive momentum estimation (GTAdam) distributed algorithm, which combines a gradient tracking mechanism with first and second order momentum estimates of the gradient. In these numerical experiments from multi-agent learning, GTAdam outperforms state-of-the-art distributed optimization methods.
arXiv Detail & Related papers (2020-09-03T15:20:21Z)
Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem. We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent. Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z)
Online Hyperparameter Search Interleaved with Proximal Parameter Updates [9.543667840503739]
We develop a method that relies on the structure of proximal gradient methods and does not require a smooth cost function. Such a method is applied to Leave-one-out (LOO)-validated Lasso and Group Lasso. Numerical experiments corroborate the convergence of the proposed method to a local optimum of the LOO validation error curve.
arXiv Detail & Related papers (2020-04-06T15:54:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.