Stable Recovery of Entangled Weights: Towards Robust Identification of
Deep Neural Networks from Minimal Samples
- URL: http://arxiv.org/abs/2101.07150v1
- Date: Mon, 18 Jan 2021 16:31:19 GMT
- Title: Stable Recovery of Entangled Weights: Towards Robust Identification of
Deep Neural Networks from Minimal Samples
- Authors: Christian Fiedler, Massimo Fornasier, Timo Klock, and Michael
Rauchensteiner
- Abstract summary: We introduce the so-called entangled weights, which compose weights of successive layers intertwined with suitable diagonal and invertible matrices depending on the activation functions and their shifts.
We prove that entangled weights are completely and stably approximated by an efficient and robust algorithm.
In terms of practical impact, our study shows that we can relate input-output information uniquely and stably to network parameters, providing a form of explainability.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper we approach the problem of unique and stable identifiability of
generic deep artificial neural networks with pyramidal shape and smooth
activation functions from a finite number of input-output samples. More
specifically we introduce the so-called entangled weights, which compose
weights of successive layers intertwined with suitable diagonal and invertible
matrices depending on the activation functions and their shifts. We prove that
entangled weights are completely and stably approximated by an efficient and
robust algorithm as soon as $\mathcal O(D^2 \times m)$ nonadaptive input-output
samples of the network are collected, where $D$ is the input dimension and $m$
is the number of neurons of the network. Moreover, we empirically observe that
the approach applies to networks with up to $\mathcal O(D \times m_L)$ neurons,
where $m_L$ is the number of output neurons at layer $L$. Provided knowledge of
layer assignments of entangled weights and of remaining scaling and shift
parameters, which may be further heuristically obtained by least squares, the
entangled weights identify the network completely and uniquely. To highlight
the relevance of the theoretical result of stable recovery of entangled
weights, we present numerical experiments, which demonstrate that multilayered
networks with generic weights can be robustly identified and therefore
uniformly approximated by the presented algorithmic pipeline. In contrast
backpropagation cannot generalize stably very well in this setting, being
always limited by relatively large uniform error. In terms of practical impact,
our study shows that we can relate input-output information uniquely and stably
to network parameters, providing a form of explainability. Moreover, our method
paves the way for compression of overparametrized networks and for the training
of minimal complexity networks.
Related papers
- Network reconstruction via the minimum description length principle [0.0]
We propose an alternative nonparametric regularization scheme based on hierarchical Bayesian inference and weight quantization.
Our approach follows the minimum description length (MDL) principle, and uncovers the weight distribution that allows for the most compression of the data.
We demonstrate that our scheme yields systematically increased accuracy in the reconstruction of both artificial and empirical networks.
arXiv Detail & Related papers (2024-05-02T05:35:09Z) - Convergence of Gradient Descent for Recurrent Neural Networks: A Nonasymptotic Analysis [16.893624100273108]
We analyze recurrent neural networks with diagonal hidden-to-hidden weight matrices trained with gradient descent in the supervised learning setting.
We prove that gradient descent can achieve optimality emphwithout massive over parameterization.
Our results are based on an explicit characterization of the class of dynamical systems that can be approximated and learned by recurrent neural networks.
arXiv Detail & Related papers (2024-02-19T15:56:43Z) - SPFQ: A Stochastic Algorithm and Its Error Analysis for Neural Network
Quantization [5.982922468400901]
We show that it is possible to achieve error bounds equivalent to that obtained in the order of the weights of a neural layer.
We prove that it is possible to achieve full-network bounds under an infinite alphabet and minimal assumptions on the input data.
arXiv Detail & Related papers (2023-09-20T00:35:16Z) - Computational Complexity of Learning Neural Networks: Smoothness and
Degeneracy [52.40331776572531]
We show that learning depth-$3$ ReLU networks under the Gaussian input distribution is hard even in the smoothed-analysis framework.
Our results are under a well-studied assumption on the existence of local pseudorandom generators.
arXiv Detail & Related papers (2023-02-15T02:00:26Z) - Finite Sample Identification of Wide Shallow Neural Networks with Biases [12.622813055808411]
The identification of the parameters of the network from finite samples of input-output pairs is often referred to as the emphteacher-student model
This paper fills the gap by providing constructive methods and theoretical guarantees of finite sample identification for such wider shallow networks with biases.
arXiv Detail & Related papers (2022-11-08T22:10:32Z) - Robust Training and Verification of Implicit Neural Networks: A
Non-Euclidean Contractive Approach [64.23331120621118]
This paper proposes a theoretical and computational framework for training and robustness verification of implicit neural networks.
We introduce a related embedded network and show that the embedded network can be used to provide an $ell_infty$-norm box over-approximation of the reachable sets of the original network.
We apply our algorithms to train implicit neural networks on the MNIST dataset and compare the robustness of our models with the models trained via existing approaches in the literature.
arXiv Detail & Related papers (2022-08-08T03:13:24Z) - BiTAT: Neural Network Binarization with Task-dependent Aggregated
Transformation [116.26521375592759]
Quantization aims to transform high-precision weights and activations of a given neural network into low-precision weights/activations for reduced memory usage and computation.
Extreme quantization (1-bit weight/1-bit activations) of compactly-designed backbone architectures results in severe performance degeneration.
This paper proposes a novel Quantization-Aware Training (QAT) method that can effectively alleviate performance degeneration.
arXiv Detail & Related papers (2022-07-04T13:25:49Z) - On the Effective Number of Linear Regions in Shallow Univariate ReLU
Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons.
Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z) - Stability of Deep Neural Networks via discrete rough paths [0.0]
We provide a priori estimates for the output of Deep Residual Neural Networks in terms of both the input data and the trained network weights.
We interpret residual neural network as solutions to (rough) difference equations, and analyse them based on recent results of discrete time signatures and rough path theory.
arXiv Detail & Related papers (2022-01-19T12:40:28Z) - The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability.
We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z) - Artificial Neural Networks generated by Low Discrepancy Sequences [59.51653996175648]
We generate artificial neural networks as random walks on a dense network graph.
Such networks can be trained sparse from scratch, avoiding the expensive procedure of training a dense network and compressing it afterwards.
We demonstrate that the artificial neural networks generated by low discrepancy sequences can achieve an accuracy within reach of their dense counterparts at a much lower computational complexity.
arXiv Detail & Related papers (2021-03-05T08:45:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.