Related papers: Ridgeless Interpolation with Shallow ReLU Networks in $1D$ is Nearest Neighbor Curvature Extrapolation and Provably Generalizes on Lipschitz Functions

Ridgeless Interpolation with Shallow ReLU Networks in $1D$ is Nearest Neighbor Curvature Extrapolation and Provably Generalizes on Lipschitz Functions

URL: http://arxiv.org/abs/2109.12960v1
Date: Mon, 27 Sep 2021 11:32:24 GMT
Title: Ridgeless Interpolation with Shallow ReLU Networks in $1D$ is Nearest Neighbor Curvature Extrapolation and Provably Generalizes on Lipschitz Functions
Authors: Boris Hanin
Abstract summary: We prove a precise geometric description of all one layer ReLU networks $z(x;theta)$ with a single linear unit. We refer to them as ridgeless ReLU interpolants. Our results show that ridgeless ReLU interpolants achieve the best possible generalization for learning $1d$ Lipschitz functions.
Score: 16.75218291152252
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We prove a precise geometric description of all one layer ReLU networks $z(x;\theta)$ with a single linear unit and input/output dimensions equal to one that interpolate a given dataset $\mathcal D=\{(x_i,f(x_i))\}$ and, among all such interpolants, minimize the $\ell_2$-norm of the neuron weights. Such networks can intuitively be thought of as those that minimize the mean-squared error over $\mathcal D$ plus an infinitesimal weight decay penalty. We therefore refer to them as ridgeless ReLU interpolants. Our description proves that, to extrapolate values $z(x;\theta)$ for inputs $x\in (x_i,x_{i+1})$ lying between two consecutive datapoints, a ridgeless ReLU interpolant simply compares the signs of the discrete estimates for the curvature of $f$ at $x_i$ and $x_{i+1}$ derived from the dataset $\mathcal D$. If the curvature estimates at $x_i$ and $x_{i+1}$ have different signs, then $z(x;\theta)$ must be linear on $(x_i,x_{i+1})$. If in contrast the curvature estimates at $x_i$ and $x_{i+1}$ are both positive (resp. negative), then $z(x;\theta)$ is convex (resp. concave) on $(x_i,x_{i+1})$. Our results show that ridgeless ReLU interpolants achieve the best possible generalization for learning $1d$ Lipschitz functions, up to universal constants.

Related papers

The Communication Complexity of Approximating Matrix Rank [50.6867896228563]
We show that this problem has randomized communication complexity $Omega(frac1kcdot n2log|mathbbF|)$. As an application, we obtain an $Omega(frac1kcdot n2log|mathbbF|)$ space lower bound for any streaming algorithm with $k$ passes.
arXiv Detail & Related papers (2024-10-26T06:21:42Z)
Optimal Sketching for Residual Error Estimation for Matrix and Vector Norms [50.15964512954274]
We study the problem of residual error estimation for matrix and vector norms using a linear sketch. We demonstrate that this gives a substantial advantage empirically, for roughly the same sketch size and accuracy as in previous work. We also show an $Omega(k2/pn1-2/p)$ lower bound for the sparse recovery problem, which is tight up to a $mathrmpoly(log n)$ factor.
arXiv Detail & Related papers (2024-08-16T02:33:07Z)
Interplay between depth and width for interpolation in neural ODEs [0.0]
We examine the interplay between their width $p$ and number of layer transitions $L$. In the high-dimensional setting, we demonstrate that $p=O(N)$ neurons are likely sufficient to achieve exact control.
arXiv Detail & Related papers (2024-01-18T11:32:50Z)
A Unified Framework for Uniform Signal Recovery in Nonlinear Generative Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously. Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples. We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z)
$\ell_p$-Regression in the Arbitrary Partition Model of Communication [59.89387020011663]
We consider the randomized communication complexity of the distributed $ell_p$-regression problem in the coordinator model. For $p = 2$, i.e., least squares regression, we give the first optimal bound of $tildeTheta(sd2 + sd/epsilon)$ bits. For $p in (1,2)$,we obtain an $tildeO(sd2/epsilon + sd/mathrmpoly(epsilon)$ upper bound.
arXiv Detail & Related papers (2023-07-11T08:51:53Z)
Optimal Sketching Bounds for Sparse Linear Regression [116.30196615349226]
We study oblivious sketching for $k$-sparse linear regression under various loss functions such as an $ell_p$ norm, or from a broad class of hinge-like loss functions. We show that for sparse $ell$vareps regression, there is an oblivious distribution over sketches with $Theta(klog(d/k)/varepsilon2)$ rows, which is tight up to a constant factor. We also show that sketching $O(mu2 klog(mu n d/varepsilon)/var
arXiv Detail & Related papers (2023-04-05T07:24:19Z)
Learning linear dynamical systems under convex constraints [4.4351901934764975]
We consider the problem of identification of linear dynamical systems from $T$ samples of a single trajectory. $A*$ can be reliably estimated for values $T$ smaller than what is needed for unconstrained setting.
arXiv Detail & Related papers (2023-03-27T11:49:40Z)
Active Sampling for Linear Regression Beyond the $\ell_2$ Norm [70.49273459706546]
We study active sampling algorithms for linear regression, which aim to query only a small number of entries of a target vector. We show that this dependence on $d$ is optimal, up to logarithmic factors. We also provide the first total sensitivity upper bound $O(dmax1,p/2log2 n)$ for loss functions with at most degree $p$ growth.
arXiv Detail & Related papers (2021-11-09T00:20:01Z)
Minimax Optimal Regression over Sobolev Spaces via Laplacian Regularization on Neighborhood Graphs [25.597646488273558]
We study the statistical properties of Laplacian smoothing, a graph-based approach to nonparametric regression. We prove that Laplacian smoothing is manifold-adaptive.
arXiv Detail & Related papers (2021-06-03T01:20:41Z)
The planted matching problem: Sharp threshold and infinite-order phase transition [25.41713098167692]
We study the problem of reconstructing a perfect matching $M*$ hidden in a randomly weighted $ntimes n$ bipartite graph. We show that if $sqrtd B(mathcalP,mathcalQ) ge 1+epsilon$ for an arbitrarily small constant $epsilon>0$, the reconstruction error for any estimator is shown to be bounded away from $0$.
arXiv Detail & Related papers (2021-03-17T00:59:33Z)
Robust testing of low-dimensional functions [8.927163098772589]
A recent work of the authors showed that linear $k$-juntas are testable. Following the surge of interest in noise-tolerant property testing, in this paper we prove a noise-tolerant (or robust) version of this result. We obtain a fully noise tolerant tester with query complexity $kO(mathsfpoly(log k/epsilon))$ for the class of intersection of $k$-halfspaces.
arXiv Detail & Related papers (2020-04-24T10:23:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.