Related papers: Globally Injective ReLU Networks

Globally Injective ReLU Networks

URL: http://arxiv.org/abs/2006.08464v4
Date: Fri, 8 Oct 2021 19:33:55 GMT
Title: Globally Injective ReLU Networks
Authors: Michael Puthawala, Konik Kothari, Matti Lassas, Ivan Dokmani\'c, Maarten de Hoop
Abstract summary: Injectivity plays an important role in generative models where it enables inference. We establish sharp characterizations of injectivity of fully-connected and convolutional ReLU layers and networks.
Score: 20.106755410331576
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Injectivity plays an important role in generative models where it enables inference; in inverse problems and compressed sensing with generative priors it is a precursor to well posedness. We establish sharp characterizations of injectivity of fully-connected and convolutional ReLU layers and networks. First, through a layerwise analysis, we show that an expansivity factor of two is necessary and sufficient for injectivity by constructing appropriate weight matrices. We show that global injectivity with iid Gaussian matrices, a commonly used tractable model, requires larger expansivity between 3.4 and 10.5. We also characterize the stability of inverting an injective network via worst-case Lipschitz constants of the inverse. We then use arguments from differential topology to study injectivity of deep networks and prove that any Lipschitz map can be approximated by an injective ReLU network. Finally, using an argument based on random projections, we show that an end-to-end -- rather than layerwise -- doubling of the dimension suffices for injectivity. Our results establish a theoretical basis for the study of nonlinear inverse and inference problems using neural networks.

Related papers

Implicit Bias of Gradient Descent for Non-Homogeneous Deep Networks [39.38604477897272]
We establish the implicit bias of gradient descent (GD) for generic non-homogeneous deep networks under exponential loss. Our results apply to a broad class of non-homogeneous networks satisfying a mild near-homogeneity condition.
arXiv Detail & Related papers (2025-02-22T04:40:45Z)
Complexity of Deciding Injectivity and Surjectivity of ReLU Neural Networks [5.482420806459269]
We prove coNP-completeness of deciding injectivity of a ReLU layer. We also characterize surjectivity for two-layer ReLU networks with one-dimensional output.
arXiv Detail & Related papers (2024-05-30T08:14:34Z)
A Library of Mirrors: Deep Neural Nets in Low Dimensions are Convex Lasso Models with Reflection Features [54.83898311047626]
We consider neural networks with piecewise linear activations ranging from 2 to an arbitrary but finite number of layers. We first show that two-layer networks with piecewise linear activations are Lasso models using a discrete dictionary of ramp depths.
arXiv Detail & Related papers (2024-03-02T00:33:45Z)
Differentially Private Non-convex Learning for Multi-layer Neural Networks [35.24835396398768]
This paper focuses on the problem of Differentially Private Tangent Optimization for (multi-layer) fully connected neural networks with a single output node. By utilizing recent advances in Neural Kernel theory, we provide the first excess population risk when both the sample size and the width of the network are sufficiently large.
arXiv Detail & Related papers (2023-10-12T15:48:14Z)
Learning Linear Causal Representations from Interventions under General Nonlinear Mixing [52.66151568785088]
We prove strong identifiability results given unknown single-node interventions without access to the intervention targets. This is the first instance of causal identifiability from non-paired interventions for deep neural network embeddings.
arXiv Detail & Related papers (2023-06-04T02:32:12Z)
Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification. Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z)
Bayesian Interpolation with Deep Linear Networks [92.1721532941863]
Characterizing how neural network depth, width, and dataset size jointly impact model quality is a central problem in deep learning theory. We show that linear networks make provably optimal predictions at infinite depth. We also show that with data-agnostic priors, Bayesian model evidence in wide linear networks is maximized at infinite depth.
arXiv Detail & Related papers (2022-12-29T20:57:46Z)
On the Effective Number of Linear Regions in Shallow Univariate ReLU Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons. Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z)
Sparsest Univariate Learning Models Under Lipschitz Constraint [31.28451181040038]
We propose continuous-domain formulations for one-dimensional regression problems. We control the Lipschitz constant explicitly using a user-defined upper-bound. We show that both problems admit global minimizers that are continuous and piecewise-linear.
arXiv Detail & Related papers (2021-12-27T07:03:43Z)
On the Role of Optimization in Double Descent: A Least Squares Study [30.44215064390409]
We show an excess risk bound for the descent gradient solution of the least squares objective. We find that in case of noiseless regression, double descent is explained solely by optimization-related quantities. We empirically explore if our predictions hold for neural networks.
arXiv Detail & Related papers (2021-07-27T09:13:11Z)
Robust Implicit Networks via Non-Euclidean Contractions [63.91638306025768]
Implicit neural networks show improved accuracy and significant reduction in memory consumption. They can suffer from ill-posedness and convergence instability. This paper provides a new framework to design well-posed and robust implicit neural networks.
arXiv Detail & Related papers (2021-06-06T18:05:02Z)
A Unifying View on Implicit Bias in Training Linear Neural Networks [31.65006970108761]
We study the implicit bias of gradient flow (i.e., gradient descent with infinitesimal step size) on linear neural network training. We propose a tensor formulation of neural networks that includes fully-connected, diagonal, and convolutional networks as special cases.
arXiv Detail & Related papers (2020-10-06T06:08:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.