Understanding and Mitigating Exploding Inverses in Invertible Neural
Networks
- URL: http://arxiv.org/abs/2006.09347v2
- Date: Fri, 24 Dec 2021 17:26:10 GMT
- Title: Understanding and Mitigating Exploding Inverses in Invertible Neural
Networks
- Authors: Jens Behrmann, Paul Vicol, Kuan-Chieh Wang, Roger Grosse,
J\"orn-Henrik Jacobsen
- Abstract summary: Invertible neural networks (INNs) have been used to design generative models, implement memory-saving gradient computation, and solve inverse problems.
In this work, we show that commonly-used INN architectures suffer from exploding inverses and are thus prone to becoming numerically non-invertible.
- Score: 12.158549746821913
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Invertible neural networks (INNs) have been used to design generative models,
implement memory-saving gradient computation, and solve inverse problems. In
this work, we show that commonly-used INN architectures suffer from exploding
inverses and are thus prone to becoming numerically non-invertible. Across a
wide range of INN use-cases, we reveal failures including the non-applicability
of the change-of-variables formula on in- and out-of-distribution (OOD) data,
incorrect gradients for memory-saving backprop, and the inability to sample
from normalizing flow models. We further derive bi-Lipschitz properties of
atomic building blocks of common architectures. These insights into the
stability of INNs then provide ways forward to remedy these failures. For tasks
where local invertibility is sufficient, like memory-saving backprop, we
propose a flexible and efficient regularizer. For problems where global
invertibility is necessary, such as applying normalizing flows on OOD data, we
show the importance of designing stable INN building blocks.
Related papers
- Enhancing Reliability of Neural Networks at the Edge: Inverted
Normalization with Stochastic Affine Transformations [0.22499166814992438]
We propose a method to inherently enhance the robustness and inference accuracy of BayNNs deployed in in-memory computing architectures.
Empirical results show a graceful degradation in inference accuracy, with an improvement of up to $58.11%$.
arXiv Detail & Related papers (2024-01-23T00:27:31Z) - Free-form Flows: Make Any Architecture a Normalizing Flow [8.163244519983298]
We develop a training procedure that uses an efficient estimator for the gradient of the change of variables formula.
This enables any dimension-preserving neural network to serve as a generative model through maximum likelihood training.
We achieve excellent results in molecule generation benchmarks utilizing $E(n)$-equivariant networks.
arXiv Detail & Related papers (2023-10-25T13:23:08Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - Measurement-Consistent Networks via a Deep Implicit Layer for Solving
Inverse Problems [0.0]
End-to-end deep neural networks (DNNs) have become state-of-the-art (SOTA) for solving inverse problems.
These networks are sensitive to minor variations in the training pipeline and often fail to reconstruct small but important details.
We propose a framework that transforms any DNN for inverse problems into a measurement-consistent one.
arXiv Detail & Related papers (2022-11-06T17:05:04Z) - Recurrent Bilinear Optimization for Binary Neural Networks [58.972212365275595]
BNNs neglect the intrinsic bilinear relationship of real-valued weights and scale factors.
Our work is the first attempt to optimize BNNs from the bilinear perspective.
We obtain robust RBONNs, which show impressive performance over state-of-the-art BNNs on various models and datasets.
arXiv Detail & Related papers (2022-09-04T06:45:33Z) - Integrating Random Effects in Deep Neural Networks [4.860671253873579]
We propose to use the mixed models framework to handle correlated data in deep neural networks.
By treating the effects underlying the correlation structure as random effects, mixed models are able to avoid overfitted parameter estimates.
Our approach which we call LMMNN is demonstrated to improve performance over natural competitors in various correlation scenarios.
arXiv Detail & Related papers (2022-06-07T14:02:24Z) - Comparative Analysis of Interval Reachability for Robust Implicit and
Feedforward Neural Networks [64.23331120621118]
We use interval reachability analysis to obtain robustness guarantees for implicit neural networks (INNs)
INNs are a class of implicit learning models that use implicit equations as layers.
We show that our approach performs at least as well as, and generally better than, applying state-of-the-art interval bound propagation methods to INNs.
arXiv Detail & Related papers (2022-04-01T03:31:27Z) - A novel Deep Neural Network architecture for non-linear system
identification [78.69776924618505]
We present a novel Deep Neural Network (DNN) architecture for non-linear system identification.
Inspired by fading memory systems, we introduce inductive bias (on the architecture) and regularization (on the loss function)
This architecture allows for automatic complexity selection based solely on available data.
arXiv Detail & Related papers (2021-06-06T10:06:07Z) - Learning to Solve the AC-OPF using Sensitivity-Informed Deep Neural
Networks [52.32646357164739]
We propose a deep neural network (DNN) to solve the solutions of the optimal power flow (ACOPF)
The proposed SIDNN is compatible with a broad range of OPF schemes.
It can be seamlessly integrated in other learning-to-OPF schemes.
arXiv Detail & Related papers (2021-03-27T00:45:23Z) - Coupled Oscillatory Recurrent Neural Network (coRNN): An accurate and
(gradient) stable architecture for learning long time dependencies [15.2292571922932]
We propose a novel architecture for recurrent neural networks.
Our proposed RNN is based on a time-discretization of a system of second-order ordinary differential equations.
Experiments show that the proposed RNN is comparable in performance to the state of the art on a variety of benchmarks.
arXiv Detail & Related papers (2020-10-02T12:35:04Z) - Frequentist Uncertainty in Recurrent Neural Networks via Blockwise
Influence Functions [121.10450359856242]
Recurrent neural networks (RNNs) are instrumental in modelling sequential and time-series data.
Existing approaches for uncertainty quantification in RNNs are based predominantly on Bayesian methods.
We develop a frequentist alternative that: (a) does not interfere with model training or compromise its accuracy, (b) applies to any RNN architecture, and (c) provides theoretical coverage guarantees on the estimated uncertainty intervals.
arXiv Detail & Related papers (2020-06-20T22:45:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.