Learning in High Dimension Always Amounts to Extrapolation
- URL: http://arxiv.org/abs/2110.09485v1
- Date: Mon, 18 Oct 2021 17:32:25 GMT
- Title: Learning in High Dimension Always Amounts to Extrapolation
- Authors: Randall Balestriero, Jerome Pesenti, Yann LeCun
- Abstract summary: Extrapolation occurs when $x$ falls outside of a convex hull.
Many intuitions and theories rely on that assumption.
We argue against those two points and demonstrate that on any high-dimensional ($>$100) dataset, almost surely never happens.
- Score: 22.220076291384686
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The notion of interpolation and extrapolation is fundamental in various
fields from deep learning to function approximation. Interpolation occurs for a
sample $x$ whenever this sample falls inside or on the boundary of the given
dataset's convex hull. Extrapolation occurs when $x$ falls outside of that
convex hull. One fundamental (mis)conception is that state-of-the-art
algorithms work so well because of their ability to correctly interpolate
training data. A second (mis)conception is that interpolation happens
throughout tasks and datasets, in fact, many intuitions and theories rely on
that assumption. We empirically and theoretically argue against those two
points and demonstrate that on any high-dimensional ($>$100) dataset,
interpolation almost surely never happens. Those results challenge the validity
of our current interpolation/extrapolation definition as an indicator of
generalization performances.
Related papers
- Inverse Entropic Optimal Transport Solves Semi-supervised Learning via Data Likelihood Maximization [65.8915778873691]
conditional distributions is a central problem in machine learning.
We propose a new learning paradigm that integrates both paired and unpaired data.
Our approach also connects intriguingly with inverse entropic optimal transport (OT)
arXiv Detail & Related papers (2024-10-03T16:12:59Z) - Extracting Manifold Information from Point Clouds [0.0]
A kernel based method is proposed for the construction of signature functions of subsets of $mathbbRd$.
The analytical and analysis of point clouds are the main application.
arXiv Detail & Related papers (2024-03-30T17:21:07Z) - Interplay between depth and width for interpolation in neural ODEs [0.0]
We examine the interplay between their width $p$ and number of layer transitions $L$.
In the high-dimensional setting, we demonstrate that $p=O(N)$ neurons are likely sufficient to achieve exact control.
arXiv Detail & Related papers (2024-01-18T11:32:50Z) - Gradient-Based Feature Learning under Structured Data [57.76552698981579]
In the anisotropic setting, the commonly used spherical gradient dynamics may fail to recover the true direction.
We show that appropriate weight normalization that is reminiscent of batch normalization can alleviate this issue.
In particular, under the spiked model with a suitably large spike, the sample complexity of gradient-based training can be made independent of the information exponent.
arXiv Detail & Related papers (2023-09-07T16:55:50Z) - Teach me how to Interpolate a Myriad of Embeddings [18.711509039868655]
Mixup refers to data-based augmentation, originally motivated as a way to go beyond empirical risk minimization.
We introduce MultiMix, which interpolates an arbitrary number $n$ of length $m$, with one vector $lambda$ per length.
Our contributions result in significant improvement over state-of-the-art mixup methods on four benchmarks.
arXiv Detail & Related papers (2022-06-29T19:16:48Z) - Benefit of Interpolation in Nearest Neighbor Algorithms [21.79888306754263]
In some studies, it is observed that over-parametrized deep neural networks achieve a small testing error even when the training error is almost zero.
We turn into another way to enforce zero training error (without over-parametrization) through a data mechanism.
arXiv Detail & Related papers (2022-02-23T22:47:18Z) - A Law of Robustness beyond Isoperimetry [84.33752026418045]
We prove a Lipschitzness lower bound $Omega(sqrtn/p)$ of robustness of interpolating neural network parameters on arbitrary distributions.
We then show the potential benefit of overparametrization for smooth data when $n=mathrmpoly(d)$.
We disprove the potential existence of an $O(1)$-Lipschitz robust interpolating function when $n=exp(omega(d))$.
arXiv Detail & Related papers (2022-02-23T16:10:23Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - An Online Learning Approach to Interpolation and Extrapolation in Domain
Generalization [53.592597682854944]
We recast generalization over sub-groups as an online game between a player minimizing risk and an adversary presenting new test.
We show that ERM is provably minimax-optimal for both tasks.
arXiv Detail & Related papers (2021-02-25T19:06:48Z) - A Random Matrix Analysis of Random Fourier Features: Beyond the Gaussian
Kernel, a Precise Phase Transition, and the Corresponding Double Descent [85.77233010209368]
This article characterizes the exacts of random Fourier feature (RFF) regression, in the realistic setting where the number of data samples $n$ is all large and comparable.
This analysis also provides accurate estimates of training and test regression errors for large $n,p,N$.
arXiv Detail & Related papers (2020-06-09T02:05:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.