Approximating Positive Homogeneous Functions with Scale Invariant Neural
Networks
- URL: http://arxiv.org/abs/2308.02836v1
- Date: Sat, 5 Aug 2023 10:17:04 GMT
- Title: Approximating Positive Homogeneous Functions with Scale Invariant Neural
Networks
- Authors: Stefan Bamberger, Reinhard Heckel, Felix Krahmer
- Abstract summary: We first consider recovery of sparse vectors from few linear measurements.
We then extend our results to a wider class of recovery problems including low-rank matrix recovery and phase retrieval.
Our results shed some light on seeming contradiction between previous works showing that neural networks for inverse problems typically have very large Lipschitz constants.
- Score: 28.2446416597989
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We investigate to what extent it is possible to solve linear inverse problems
with $ReLu$ networks. Due to the scaling invariance arising from the linearity,
an optimal reconstruction function $f$ for such a problem is positive
homogeneous, i.e., satisfies $f(\lambda x) = \lambda f(x)$ for all non-negative
$\lambda$. In a $ReLu$ network, this condition translates to considering
networks without bias terms. We first consider recovery of sparse vectors from
few linear measurements. We prove that $ReLu$- networks with only one hidden
layer cannot even recover $1$-sparse vectors, not even approximately, and
regardless of the width of the network. However, with two hidden layers,
approximate recovery with arbitrary precision and arbitrary sparsity level $s$
is possible in a stable way. We then extend our results to a wider class of
recovery problems including low-rank matrix recovery and phase retrieval.
Furthermore, we also consider the approximation of general positive homogeneous
functions with neural networks. Extending previous work, we establish new
results explaining under which conditions such functions can be approximated
with neural networks. Our results also shed some light on the seeming
contradiction between previous works showing that neural networks for inverse
problems typically have very large Lipschitz constants, but still perform very
well also for adversarial noise. Namely, the error bounds in our expressivity
results include a combination of a small constant term and a term that is
linear in the noise level, indicating that robustness issues may occur only for
very small noise levels.
Related papers
- Implicit Bias of Gradient Descent for Two-layer ReLU and Leaky ReLU
Networks on Nearly-orthogonal Data [66.1211659120882]
The implicit bias towards solutions with favorable properties is believed to be a key reason why neural networks trained by gradient-based optimization can generalize well.
While the implicit bias of gradient flow has been widely studied for homogeneous neural networks (including ReLU and leaky ReLU networks), the implicit bias of gradient descent is currently only understood for smooth neural networks.
arXiv Detail & Related papers (2023-10-29T08:47:48Z) - The Implicit Bias of Minima Stability in Multivariate Shallow ReLU
Networks [53.95175206863992]
We study the type of solutions to which gradient descent converges when used to train a single hidden-layer multivariate ReLU network with the quadratic loss.
We prove that although shallow ReLU networks are universal approximators, stable shallow networks are not.
arXiv Detail & Related papers (2023-06-30T09:17:39Z) - Generalization and Stability of Interpolating Neural Networks with
Minimal Width [37.908159361149835]
We investigate the generalization and optimization of shallow neural-networks trained by gradient in the interpolating regime.
We prove the training loss number minimizations $m=Omega(log4 (n))$ neurons and neurons $Tapprox n$.
With $m=Omega(log4 (n))$ neurons and $Tapprox n$, we bound the test loss training by $tildeO (1/)$.
arXiv Detail & Related papers (2023-02-18T05:06:15Z) - The Onset of Variance-Limited Behavior for Networks in the Lazy and Rich
Regimes [75.59720049837459]
We study the transition from infinite-width behavior to this variance limited regime as a function of sample size $P$ and network width $N$.
We find that finite-size effects can become relevant for very small datasets on the order of $P* sim sqrtN$ for regression with ReLU networks.
arXiv Detail & Related papers (2022-12-23T04:48:04Z) - Neural Network Approximation of Continuous Functions in High Dimensions
with Applications to Inverse Problems [6.84380898679299]
Current theory predicts that networks should scale exponentially in the dimension of the problem.
We provide a general method for bounding the complexity required for a neural network to approximate a H"older (or uniformly) continuous function.
arXiv Detail & Related papers (2022-08-28T22:44:07Z) - On the Effective Number of Linear Regions in Shallow Univariate ReLU
Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons.
Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z) - The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability.
We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z) - The Rate of Convergence of Variation-Constrained Deep Neural Networks [35.393855471751756]
We show that a class of variation-constrained neural networks can achieve near-parametric rate $n-1/2+delta$ for an arbitrarily small constant $delta$.
The result indicates that the neural function space needed for approximating smooth functions may not be as large as what is often perceived.
arXiv Detail & Related papers (2021-06-22T21:28:00Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - Regularization Matters: A Nonparametric Perspective on Overparametrized
Neural Network [20.132432350255087]
Overparametrized neural networks trained by tangent descent (GD) can provably overfit any training data.
This paper studies how well overparametrized neural networks can recover the true target function in the presence of random noises.
arXiv Detail & Related papers (2020-07-06T01:02:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.