General Loss Functions Lead to (Approximate) Interpolation in High
Dimensions
- URL: http://arxiv.org/abs/2303.07475v1
- Date: Mon, 13 Mar 2023 21:23:12 GMT
- Title: General Loss Functions Lead to (Approximate) Interpolation in High
Dimensions
- Authors: Kuo-Wei Lai, Vidya Muthukumar
- Abstract summary: We provide a unified framework to approximately characterize the implicit bias of gradient descent in closed form.
Specifically, we show that the implicit bias is approximated (but not exactly equal to) the minimum-norm in high dimensions.
Our framework also recovers existing exact equivalence results for exponentially-tailed losses across binary and multiclass settings.
- Score: 6.738946307589741
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We provide a unified framework, applicable to a general family of convex
losses and across binary and multiclass settings in the overparameterized
regime, to approximately characterize the implicit bias of gradient descent in
closed form. Specifically, we show that the implicit bias is approximated (but
not exactly equal to) the minimum-norm interpolation in high dimensions, which
arises from training on the squared loss. In contrast to prior work which was
tailored to exponentially-tailed losses and used the intermediate
support-vector-machine formulation, our framework directly builds on the
primal-dual analysis of Ji and Telgarsky (2021), allowing us to provide new
approximate equivalences for general convex losses through a novel sensitivity
analysis. Our framework also recovers existing exact equivalence results for
exponentially-tailed losses across binary and multiclass settings. Finally, we
provide evidence for the tightness of our techniques, which we use to
demonstrate the effect of certain loss functions designed for
out-of-distribution problems on the closed-form solution.
Related papers
- The Implicit Bias of Gradient Descent on Separable Multiclass Data [38.05903703331163]
We employ the framework of Permutation Equivariant and Relative Margin-based (PERM) losses to introduce a multiclass extension of the exponential tail property.
Our proof techniques closely mirror those of the binary case, thus illustrating the power of the PERM framework for bridging the binary-multiclass gap.
arXiv Detail & Related papers (2024-11-02T19:39:21Z) - On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics.
The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Curvature-Independent Last-Iterate Convergence for Games on Riemannian
Manifolds [77.4346324549323]
We show that a step size agnostic to the curvature of the manifold achieves a curvature-independent and linear last-iterate convergence rate.
To the best of our knowledge, the possibility of curvature-independent rates and/or last-iterate convergence has not been considered before.
arXiv Detail & Related papers (2023-06-29T01:20:44Z) - Expressive Losses for Verified Robustness via Convex Combinations [67.54357965665676]
We study the relationship between the over-approximation coefficient and performance profiles across different expressive losses.
We show that, while expressivity is essential, better approximations of the worst-case loss are not necessarily linked to superior robustness-accuracy trade-offs.
arXiv Detail & Related papers (2023-05-23T12:20:29Z) - A Non-Asymptotic Moreau Envelope Theory for High-Dimensional Generalized
Linear Models [33.36787620121057]
We prove a new generalization bound that shows for any class of linear predictors in Gaussian space.
We use our finite-sample bound to directly recover the "optimistic rate" of Zhou et al. (2021)
We show that application of our bound generalization using localized Gaussian width will generally be sharp for empirical risk minimizers.
arXiv Detail & Related papers (2022-10-21T16:16:55Z) - Stability vs Implicit Bias of Gradient Methods on Separable Data and
Beyond [33.593203156666746]
We focus on the generalization properties of unregularized gradient-based learning procedures applied to separable linear classification.
We give an additional unified explanation for this generalization, that we refer to as realizability and self-boundedness.
In some of these cases, our bounds significantly improve upon the existing generalization error bounds in the literature.
arXiv Detail & Related papers (2022-02-27T19:56:36Z) - Generalization Bounds via Convex Analysis [12.411844611718958]
We show that it is possible to replace the mutual information by any strongly convex function of the joint input-output distribution.
Examples include bounds stated in terms of $p$-norm divergences and the Wasserstein-2 distance.
arXiv Detail & Related papers (2022-02-10T12:30:45Z) - Interpolation can hurt robust generalization even when there is no noise [76.3492338989419]
We show that avoiding generalization through ridge regularization can significantly improve generalization even in the absence of noise.
We prove this phenomenon for the robust risk of both linear regression and classification and hence provide the first theoretical result on robust overfitting.
arXiv Detail & Related papers (2021-08-05T23:04:15Z) - Wide flat minima and optimal generalization in classifying
high-dimensional Gaussian mixtures [8.556763944288116]
We show that there exist configurations that achieve the Bayes-optimal generalization error, even in the case of unbalanced clusters.
We also consider the algorithmically relevant case of targeting wide flat minima of the mean squared error loss.
arXiv Detail & Related papers (2020-10-27T01:32:03Z) - Domain Adaptation: Learning Bounds and Algorithms [80.85426994513541]
We introduce a novel distance between distributions, discrepancy distance, that is tailored to adaptation problems with arbitrary loss functions.
We derive novel generalization bounds for domain adaptation for a wide family of loss functions.
We also present a series of novel adaptation bounds for large classes of regularization-based algorithms.
arXiv Detail & Related papers (2009-02-19T18:42:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.