Dimension Free Generalization Bounds for Non Linear Metric Learning
- URL: http://arxiv.org/abs/2102.03802v1
- Date: Sun, 7 Feb 2021 14:47:00 GMT
- Title: Dimension Free Generalization Bounds for Non Linear Metric Learning
- Authors: Mark Kozdoba and Shie Mannor
- Abstract summary: We provide uniform generalization bounds for two regimes -- the sparse regime, and a non-sparse regime.
We show that by relying on a different, new property of the solutions, it is still possible to provide dimension free generalization guarantees.
- Score: 61.193693608166114
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work we study generalization guarantees for the metric learning
problem, where the metric is induced by a neural network type embedding of the
data. Specifically, we provide uniform generalization bounds for two regimes --
the sparse regime, and a non-sparse regime which we term \emph{bounded
amplification}. The sparse regime bounds correspond to situations where
$\ell_1$-type norms of the parameters are small. Similarly to the situation in
classification, solutions satisfying such bounds can be obtained by an
appropriate regularization of the problem. On the other hand, unregularized SGD
optimization of a metric learning loss typically does not produce sparse
solutions. We show that despite this lack of sparsity, by relying on a
different, new property of the solutions, it is still possible to provide
dimension free generalization guarantees. Consequently, these bounds can
explain generalization in non sparse real experimental situations. We
illustrate the studied phenomena on the MNIST and 20newsgroups datasets.
Related papers
- Error Feedback under $(L_0,L_1)$-Smoothness: Normalization and Momentum [56.37522020675243]
We provide the first proof of convergence for normalized error feedback algorithms across a wide range of machine learning problems.
We show that due to their larger allowable stepsizes, our new normalized error feedback algorithms outperform their non-normalized counterparts on various tasks.
arXiv Detail & Related papers (2024-10-22T10:19:27Z) - Slicing Mutual Information Generalization Bounds for Neural Networks [14.48773730230054]
We introduce new, tighter information-theoretic generalization bounds tailored for deep learning algorithms.
Our bounds offer significant computational and statistical advantages over standard MI bounds.
We extend our analysis to algorithms whose parameters do not need to exactly lie on random subspaces.
arXiv Detail & Related papers (2024-06-06T13:15:37Z) - Instance-Dependent Generalization Bounds via Optimal Transport [51.71650746285469]
Existing generalization bounds fail to explain crucial factors that drive the generalization of modern neural networks.
We derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function in the data space.
We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
arXiv Detail & Related papers (2022-11-02T16:39:42Z) - On the Importance of Gradient Norm in PAC-Bayesian Bounds [92.82627080794491]
We propose a new generalization bound that exploits the contractivity of the log-Sobolev inequalities.
We empirically analyze the effect of this new loss-gradient norm term on different neural architectures.
arXiv Detail & Related papers (2022-10-12T12:49:20Z) - Avoiding unwanted results in locally linear embedding: A new
understanding of regularization [1.0152838128195465]
Local linear embedding inherently admits some unwanted results when no regularization is used.
It is observed that all these bad results can be effectively prevented by using regularization.
arXiv Detail & Related papers (2021-08-28T17:23:47Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z) - Implicit Regularization of Sub-Gradient Method in Robust Matrix
Recovery: Don't be Afraid of Outliers [6.320141734801679]
We show that a simple sub-gradient method converges to the true low-rank solution efficiently.
We also build upon a new notion of restricted isometry property, called sign-RIP, to prove the robustness of the method.
arXiv Detail & Related papers (2021-02-05T02:52:00Z) - Implicit Regularization in ReLU Networks with the Square Loss [56.70360094597169]
We show that it is impossible to characterize the implicit regularization with the square loss by any explicit function of the model parameters.
Our results suggest that a more general framework may be needed to understand implicit regularization for nonlinear predictors.
arXiv Detail & Related papers (2020-12-09T16:48:03Z) - Failures of model-dependent generalization bounds for least-norm
interpolation [39.97534972432276]
We consider bounds on the generalization performance of the least-norm linear regressor.
For a variety of natural joint distributions on training examples, any valid generalization bound must sometimes be very loose.
arXiv Detail & Related papers (2020-10-16T16:30:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.