A Lifted Bregman Formulation for the Inversion of Deep Neural Networks
- URL: http://arxiv.org/abs/2303.01965v1
- Date: Wed, 1 Mar 2023 20:30:22 GMT
- Title: A Lifted Bregman Formulation for the Inversion of Deep Neural Networks
- Authors: Xiaoyu Wang, Martin Benning
- Abstract summary: We propose a novel framework for the regularised inversion of deep neural networks.
The framework lifts the parameter space into a higher dimensional space by introducing auxiliary variables.
We present theoretical results and support their practical application with numerical examples.
- Score: 28.03724379169264
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We propose a novel framework for the regularised inversion of deep neural
networks. The framework is based on the authors' recent work on training
feed-forward neural networks without the differentiation of activation
functions. The framework lifts the parameter space into a higher dimensional
space by introducing auxiliary variables, and penalises these variables with
tailored Bregman distances. We propose a family of variational regularisations
based on these Bregman distances, present theoretical results and support their
practical application with numerical examples. In particular, we present the
first convergence result (to the best of our knowledge) for the regularised
inversion of a single-layer perceptron that only assumes that the solution of
the inverse problem is in the range of the regularisation operator, and that
shows that the regularised inverse provably converges to the true inverse if
measurement errors converge to zero.
Related papers
- Benign Overfitting for Regression with Trained Two-Layer ReLU Networks [14.36840959836957]
We study the least-square regression problem with a two-layer fully-connected neural network, with ReLU activation function, trained by gradient flow.
Our first result is a generalization result, that requires no assumptions on the underlying regression function or the noise other than that they are bounded.
arXiv Detail & Related papers (2024-10-08T16:54:23Z) - Generalization of Scaled Deep ResNets in the Mean-Field Regime [55.77054255101667]
We investigate emphscaled ResNet in the limit of infinitely deep and wide neural networks.
Our results offer new insights into the generalization ability of deep ResNet beyond the lazy training regime.
arXiv Detail & Related papers (2024-03-14T21:48:00Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Regularization, early-stopping and dreaming: a Hopfield-like setup to
address generalization and overfitting [0.0]
We look for optimal network parameters by applying a gradient descent over a regularized loss function.
Within this framework, the optimal neuron-interaction matrices correspond to Hebbian kernels revised by a reiterated unlearning protocol.
arXiv Detail & Related papers (2023-08-01T15:04:30Z) - Implicit Regularization for Group Sparsity [33.487964460794764]
We show that gradient descent over the squared regression loss, without any explicit regularization, biases towards solutions with a group sparsity structure.
We analyze the gradient dynamics of the corresponding regression problem in the general noise setting and obtain minimax-optimal error rates.
In the degenerate case of size-one groups, our approach gives rise to a new algorithm for sparse linear regression.
arXiv Detail & Related papers (2023-01-29T20:54:03Z) - Scale-invariant Bayesian Neural Networks with Connectivity Tangent
Kernel [30.088226334627375]
We show that flatness and generalization bounds can be changed arbitrarily according to the scale of a parameter.
We propose new prior and posterior distributions invariant to scaling transformations by textitdecomposing the scale and connectivity of parameters.
We empirically demonstrate our posterior provides effective flatness and calibration measures with low complexity.
arXiv Detail & Related papers (2022-09-30T03:31:13Z) - On the Effective Number of Linear Regions in Shallow Univariate ReLU
Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons.
Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z) - Mean-field Analysis of Piecewise Linear Solutions for Wide ReLU Networks [83.58049517083138]
We consider a two-layer ReLU network trained via gradient descent.
We show that SGD is biased towards a simple solution.
We also provide empirical evidence that knots at locations distinct from the data points might occur.
arXiv Detail & Related papers (2021-11-03T15:14:20Z) - Robust lEarned Shrinkage-Thresholding (REST): Robust unrolling for
sparse recover [87.28082715343896]
We consider deep neural networks for solving inverse problems that are robust to forward model mis-specifications.
We design a new robust deep neural network architecture by applying algorithm unfolding techniques to a robust version of the underlying recovery problem.
The proposed REST network is shown to outperform state-of-the-art model-based and data-driven algorithms in both compressive sensing and radar imaging problems.
arXiv Detail & Related papers (2021-10-20T06:15:45Z) - End-to-end reconstruction meets data-driven regularization for inverse
problems [2.800608984818919]
We propose an unsupervised approach for learning end-to-end reconstruction operators for ill-posed inverse problems.
The proposed method combines the classical variational framework with iterative unrolling.
We demonstrate with the example of X-ray computed tomography (CT) that our approach outperforms state-of-the-art unsupervised methods.
arXiv Detail & Related papers (2021-06-07T12:05:06Z) - Sampling-free Variational Inference for Neural Networks with
Multiplicative Activation Noise [51.080620762639434]
We propose a more efficient parameterization of the posterior approximation for sampling-free variational inference.
Our approach yields competitive results for standard regression problems and scales well to large-scale image classification tasks.
arXiv Detail & Related papers (2021-03-15T16:16:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.