Nondeterminism and Instability in Neural Network Optimization
- URL: http://arxiv.org/abs/2103.04514v1
- Date: Mon, 8 Mar 2021 02:28:18 GMT
- Title: Nondeterminism and Instability in Neural Network Optimization
- Authors: Cecilia Summers, Michael J. Dinneen
- Abstract summary: Nondeterminism in neural network optimization produces uncertainty in performance.
We show that all sources of nondeterminism have similar effects on measures of model diversity.
We propose two approaches for reducing the effects of instability on run-to-run variability.
- Score: 7.6146285961466
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Nondeterminism in neural network optimization produces uncertainty in
performance, making small improvements difficult to discern from run-to-run
variability. While uncertainty can be reduced by training multiple model
copies, doing so is time-consuming, costly, and harms reproducibility. In this
work, we establish an experimental protocol for understanding the effect of
optimization nondeterminism on model diversity, allowing us to isolate the
effects of a variety of sources of nondeterminism. Surprisingly, we find that
all sources of nondeterminism have similar effects on measures of model
diversity. To explain this intriguing fact, we identify the instability of
model training, taken as an end-to-end procedure, as the key determinant. We
show that even one-bit changes in initial parameters result in models
converging to vastly different values. Last, we propose two approaches for
reducing the effects of instability on run-to-run variability.
Related papers
- Variational Voxel Pseudo Image Tracking [127.46919555100543]
Uncertainty estimation is an important task for critical problems, such as robotics and autonomous driving.
We propose a Variational Neural Network-based version of a Voxel Pseudo Image Tracking (VPIT) method for 3D Single Object Tracking.
arXiv Detail & Related papers (2023-02-12T13:34:50Z) - Modeling Uncertain Feature Representation for Domain Generalization [49.129544670700525]
We show that our method consistently improves the network generalization ability on multiple vision tasks.
Our methods are simple yet effective and can be readily integrated into networks without additional trainable parameters or loss constraints.
arXiv Detail & Related papers (2023-01-16T14:25:02Z) - Identifying Weight-Variant Latent Causal Models [82.14087963690561]
We find that transitivity acts as a key role in impeding the identifiability of latent causal representations.
Under some mild assumptions, we can show that the latent causal representations can be identified up to trivial permutation and scaling.
We propose a novel method, termed Structural caUsAl Variational autoEncoder, which directly learns latent causal representations and causal relationships among them.
arXiv Detail & Related papers (2022-08-30T11:12:59Z) - How Tempering Fixes Data Augmentation in Bayesian Neural Networks [22.188535244056016]
We show that tempering implicitly reduces the misspecification arising from modeling augmentations as i.i.d. data.
The temperature mimics the role of the effective sample size, reflecting the gain in information provided by the augmentations.
arXiv Detail & Related papers (2022-05-27T11:06:56Z) - On the Prediction Instability of Graph Neural Networks [2.3605348648054463]
Instability of trained models can affect reliability, reliability, and trust in machine learning systems.
We systematically assess the prediction instability of node classification with state-of-the-art Graph Neural Networks (GNNs)
We find that up to one third of the incorrectly classified nodes differ across algorithm runs.
arXiv Detail & Related papers (2022-05-20T10:32:59Z) - Uncertainty Modeling for Out-of-Distribution Generalization [56.957731893992495]
We argue that the feature statistics can be properly manipulated to improve the generalization ability of deep learning models.
Common methods often consider the feature statistics as deterministic values measured from the learned features.
We improve the network generalization ability by modeling the uncertainty of domain shifts with synthesized feature statistics during training.
arXiv Detail & Related papers (2022-02-08T16:09:12Z) - Estimation of Bivariate Structural Causal Models by Variational Gaussian
Process Regression Under Likelihoods Parametrised by Normalising Flows [74.85071867225533]
Causal mechanisms can be described by structural causal models.
One major drawback of state-of-the-art artificial intelligence is its lack of explainability.
arXiv Detail & Related papers (2021-09-06T14:52:58Z) - Doubly Stochastic Variational Inference for Neural Processes with
Hierarchical Latent Variables [37.43541345780632]
We present a new variant of Neural Process (NP) model that we call Doubly Variational Neural Process (DSVNP)
This model combines the global latent variable and local latent variables for prediction. We evaluate this model in several experiments, and our results demonstrate competitive prediction performance in multi-output regression and uncertainty estimation in classification.
arXiv Detail & Related papers (2020-08-21T13:32:12Z) - Multiplicative noise and heavy tails in stochastic optimization [62.993432503309485]
empirical optimization is central to modern machine learning, but its role in its success is still unclear.
We show that it commonly arises in parameters of discrete multiplicative noise due to variance.
A detailed analysis is conducted in which we describe on key factors, including recent step size, and data, all exhibit similar results on state-of-the-art neural network models.
arXiv Detail & Related papers (2020-06-11T09:58:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.