Related papers: Connecting Optimization and Generalization via Gradient Flow Path Length

Connecting Optimization and Generalization via Gradient Flow Path Length

URL: http://arxiv.org/abs/2202.10670v1
Date: Tue, 22 Feb 2022 04:58:16 GMT
Title: Connecting Optimization and Generalization via Gradient Flow Path Length
Authors: Fusheng Liu, Haizhao Yang, Soufiane Hayou, Qianxiao Li
Abstract summary: We propose a framework to connect optimization with generalization by analyzing the generalization error based on the length of optimization trajectory under the gradient flow algorithm after convergence. Our framework can be applied to broad settings. For example, we use it to obtain generalization estimates on three distinct machine learning models.
Score: 17.099964074542818
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Optimization and generalization are two essential aspects of machine learning. In this paper, we propose a framework to connect optimization with generalization by analyzing the generalization error based on the length of optimization trajectory under the gradient flow algorithm after convergence. Through our approach, we show that, with a proper initialization, gradient flow converges following a short path with an explicit length estimate. Such an estimate induces a length-based generalization bound, showing that short optimization paths after convergence are associated with good generalization, which also matches our numerical results. Our framework can be applied to broad settings. For example, we use it to obtain generalization estimates on three distinct machine learning models: underdetermined $\ell_p$ linear regression, kernel regression, and overparameterized two-layer ReLU neural networks.

Related papers

Gradient-Normalized Smoothness for Optimization with Approximate Hessians [37.1630298053787]
We develop algorithms that use approximate second-order information combined with the gradient regularization technique to achieve fast global convergence rates.<n>We demonstrate direct applications of our results in logistic regression problems with Hessians, as well as in non-continuous soft convergence optimization using Fisher and Gauss.
arXiv Detail & Related papers (2025-06-16T17:19:34Z)
Stochastic Gradient Descent for Gaussian Processes Done Right [86.83678041846971]
We show that when emphdone right -- by which we mean using specific insights from optimisation and kernel communities -- gradient descent is highly effective. We introduce a emphstochastic dual descent algorithm, explain its design in an intuitive manner and illustrate the design choices. Our method places Gaussian process regression on par with state-of-the-art graph neural networks for molecular binding affinity prediction.
arXiv Detail & Related papers (2023-10-31T16:15:13Z)
Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training. We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z)
Implicit regularization in AI meets generalized hardness of approximation in optimization -- Sharp results for diagonal linear networks [0.0]
We show sharp results for the implicit regularization imposed by the gradient flow of Diagonal Linear Networks. We link this to the phenomenon of phase transitions in generalized hardness of approximation. Non-sharpness of our results would imply that the GHA phenomenon would not occur for the basis pursuit optimization problem.
arXiv Detail & Related papers (2023-07-13T13:27:51Z)
On the Effect of Initialization: The Scaling Path of 2-Layer Neural Networks [21.69222364939501]
In supervised learning, the regularization path is sometimes used as a convenient theoretical proxy for the optimization path of gradient descent from zero. We show that the path interpolates continuously between the so-called kernel and rich regimes.
arXiv Detail & Related papers (2023-03-31T05:32:11Z)
Instance-Dependent Generalization Bounds via Optimal Transport [51.71650746285469]
Existing generalization bounds fail to explain crucial factors that drive the generalization of modern neural networks. We derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function in the data space. We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
arXiv Detail & Related papers (2022-11-02T16:39:42Z)
Unified Convergence Analysis for Adaptive Optimization with Moving Average Estimator [75.05106948314956]
We show that an increasing large momentum parameter for the first-order moment is sufficient for adaptive scaling. We also give insights for increasing the momentum in a stagewise manner in accordance with stagewise decreasing step size.
arXiv Detail & Related papers (2021-04-30T08:50:24Z)
Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information. We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z)
Channel-Directed Gradients for Optimization of Convolutional Neural Networks [50.34913837546743]
We introduce optimization methods for convolutional neural networks that can be used to improve existing gradient-based optimization in terms of generalization error. We show that defining the gradients along the output channel direction leads to a performance boost, while other directions can be detrimental.
arXiv Detail & Related papers (2020-08-25T00:44:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.