Learning Provably Improves the Convergence of Gradient Descent
- URL: http://arxiv.org/abs/2501.18092v2
- Date: Sat, 01 Feb 2025 12:44:10 GMT
- Title: Learning Provably Improves the Convergence of Gradient Descent
- Authors: Qingyu Song, Wei Lin, Hong Xu,
- Abstract summary: We study the convergence of Learning to Optimize (L2O) problems by training-based solvers.
An algorithm's tangent significantly enhances L2O's convergence.
Our findings indicate 50% outperformance over the GD methods.
- Score: 9.82454981262489
- License:
- Abstract: As a specialized branch of deep learning, Learning to Optimize (L2O) tackles optimization problems by training DNN-based solvers. Despite achieving significant success in various scenarios, such as faster convergence in solving convex optimizations and improved optimality in addressing non-convex cases, there remains a deficiency in theoretical support. Current research heavily relies on stringent assumptions that do not align with the intricacies of the training process. To address this gap, our study aims to establish L2O's convergence through its training methodology. We demonstrate that learning an algorithm's hyperparameters significantly enhances its convergence. Focusing on the gradient descent (GD) algorithm for quadratic programming, we prove the convergence of L2O's training using the neural tangent kernel theory. Moreover, we conduct empirical evaluations using synthetic datasets. Our findings indicate exceeding 50\% outperformance over the GD methods.
Related papers
- Gradient Alignment in Physics-informed Neural Networks: A Second-Order Optimization Perspective [12.712238596012742]
We present theoretical and practical approaches for addressing directional conflicts between loss terms.
We show how these conflicts limit first-order methods and show that second-order optimization naturally resolves them.
We prove that SOAP, a recently proposed quasi-Newton method, efficiently approximates the Hessian preconditioner.
arXiv Detail & Related papers (2025-02-02T00:21:45Z) - Linearly Convergent Mixup Learning [0.0]
We present two novel algorithms that extend to a broader range of binary classification models.
Unlike gradient-based approaches, our algorithms do not require hyper parameters like learning rates, simplifying their implementation and optimization.
Our algorithms achieve faster convergence to the optimal solution compared to descent gradient approaches, and that mixup data augmentation consistently improves the predictive performance across various loss functions.
arXiv Detail & Related papers (2025-01-14T02:33:40Z) - Stochastic Unrolled Federated Learning [85.6993263983062]
We introduce UnRolled Federated learning (SURF), a method that expands algorithm unrolling to federated learning.
Our proposed method tackles two challenges of this expansion, namely the need to feed whole datasets to the unrolleds and the decentralized nature of federated learning.
arXiv Detail & Related papers (2023-05-24T17:26:22Z) - Implicit Stochastic Gradient Descent for Training Physics-informed
Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems.
PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features.
In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z) - Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting.
We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z) - Versatile Single-Loop Method for Gradient Estimator: First and Second
Order Optimality, and its Application to Federated Learning [45.78238792836363]
We present a single-loop algorithm named SLEDGE (Single-Loop-E Gradient Estimator) for periodic convergence.
Unlike existing methods, SLEDGE has the advantage of versatility; (ii) second-order optimal, (ii) in the PL region, and (iii) smaller complexity under less of data.
arXiv Detail & Related papers (2022-09-01T11:05:26Z) - Unified Convergence Analysis for Adaptive Optimization with Moving Average Estimator [75.05106948314956]
We show that an increasing large momentum parameter for the first-order moment is sufficient for adaptive scaling.
We also give insights for increasing the momentum in a stagewise manner in accordance with stagewise decreasing step size.
arXiv Detail & Related papers (2021-04-30T08:50:24Z) - Learning to Optimize: A Primer and A Benchmark [94.29436694770953]
Learning to optimize (L2O) is an emerging approach that leverages machine learning to develop optimization methods.
This article is poised to be the first comprehensive survey and benchmark of L2O for continuous optimization.
arXiv Detail & Related papers (2021-03-23T20:46:20Z) - Gradient Descent Averaging and Primal-dual Averaging for Strongly Convex
Optimization [15.731908248435348]
We develop gradient descent averaging and primal-dual averaging algorithms for strongly convex cases.
We prove that primal-dual averaging yields the optimal convergence rate in terms of output averaging, while SC-PDA derives the optimal individual convergence.
Several experiments on SVMs and deep learning models validate the correctness of theoretical analysis and effectiveness of algorithms.
arXiv Detail & Related papers (2020-12-29T01:40:30Z) - Bilevel Optimization: Convergence Analysis and Enhanced Design [63.64636047748605]
Bilevel optimization is a tool for many machine learning problems.
We propose a novel stoc-efficientgradient estimator named stoc-BiO.
arXiv Detail & Related papers (2020-10-15T18:09:48Z) - BAMSProd: A Step towards Generalizing the Adaptive Optimization Methods
to Deep Binary Model [34.093978443640616]
Recent methods have significantly reduced the performance of Binary Neural Networks (BNNs)
guaranteeing the effective and efficient training of BNNs is an unsolved problem.
We propose a BAMSProd algorithm with a key observation that the convergence property of optimizing deep binary model is strongly related to the quantization errors.
arXiv Detail & Related papers (2020-09-29T06:12:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.