On the Reproducibility of Neural Network Predictions
- URL: http://arxiv.org/abs/2102.03349v1
- Date: Fri, 5 Feb 2021 18:51:01 GMT
- Title: On the Reproducibility of Neural Network Predictions
- Authors: Srinadh Bhojanapalli, Kimberly Wilber, Andreas Veit, Ankit Singh
Rawat, Seungyeon Kim, Aditya Menon, Sanjiv Kumar
- Abstract summary: We study the problem of churn, identify factors that cause it, and propose two simple means of mitigating it.
We first demonstrate that churn is indeed an issue, even for standard image classification tasks.
We propose using emphminimum entropy regularizers to increase prediction confidences.
We present empirical results showing the effectiveness of both techniques in reducing churn while improving the accuracy of the underlying model.
- Score: 52.47827424679645
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Standard training techniques for neural networks involve multiple sources of
randomness, e.g., initialization, mini-batch ordering and in some cases data
augmentation. Given that neural networks are heavily over-parameterized in
practice, such randomness can cause {\em churn} -- for the same input,
disagreements between predictions of the two models independently trained by
the same algorithm, contributing to the `reproducibility challenges' in modern
machine learning. In this paper, we study this problem of churn, identify
factors that cause it, and propose two simple means of mitigating it. We first
demonstrate that churn is indeed an issue, even for standard image
classification tasks (CIFAR and ImageNet), and study the role of the different
sources of training randomness that cause churn. By analyzing the relationship
between churn and prediction confidences, we pursue an approach with two
components for churn reduction. First, we propose using \emph{minimum entropy
regularizers} to increase prediction confidences. Second, \changes{we present a
novel variant of co-distillation approach~\citep{anil2018large} to increase
model agreement and reduce churn}. We present empirical results showing the
effectiveness of both techniques in reducing churn while improving the accuracy
of the underlying model.
Related papers
- Tilt your Head: Activating the Hidden Spatial-Invariance of Classifiers [0.7704032792820767]
Deep neural networks are applied in more and more areas of everyday life.
They still lack essential abilities, such as robustly dealing with spatially transformed input signals.
We propose a novel technique to emulate such an inference process for neural nets.
arXiv Detail & Related papers (2024-05-06T09:47:29Z) - Wasserstein distributional robustness of neural networks [9.79503506460041]
Deep neural networks are known to be vulnerable to adversarial attacks (AA)
For an image recognition task, this means that a small perturbation of the original can result in the image being misclassified.
We re-cast the problem using techniques of Wasserstein distributionally robust optimization (DRO) and obtain novel contributions.
arXiv Detail & Related papers (2023-06-16T13:41:24Z) - VCNet: A self-explaining model for realistic counterfactual generation [52.77024349608834]
Counterfactual explanation is a class of methods to make local explanations of machine learning decisions.
We present VCNet-Variational Counter Net, a model architecture that combines a predictor and a counterfactual generator.
We show that VCNet is able to both generate predictions, and to generate counterfactual explanations without having to solve another minimisation problem.
arXiv Detail & Related papers (2022-12-21T08:45:32Z) - On the effectiveness of partial variance reduction in federated learning
with heterogeneous data [27.527995694042506]
We show that the diversity of the final classification layers across clients impedes the performance of the FedAvg algorithm.
Motivated by this, we propose to correct model by variance reduction only on the final layers.
We demonstrate that this significantly outperforms existing benchmarks at a similar or lower communication cost.
arXiv Detail & Related papers (2022-12-05T11:56:35Z) - MEMO: Test Time Robustness via Adaptation and Augmentation [131.28104376280197]
We study the problem of test time robustification, i.e., using the test input to improve model robustness.
Recent prior works have proposed methods for test time adaptation, however, they each introduce additional assumptions.
We propose a simple approach that can be used in any test setting where the model is probabilistic and adaptable.
arXiv Detail & Related papers (2021-10-18T17:55:11Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - Training Generative Adversarial Networks via stochastic Nash games [2.995087247817663]
Generative adversarial networks (GANs) are a class of generative models with two antagonistic neural networks: a generator and a discriminator.
We show convergence to an exact solution when an increasing number of data is available.
We also show convergence of an averaged variant of the SRFB algorithm to a neighborhood of the solution when only few samples are available.
arXiv Detail & Related papers (2020-10-17T09:07:40Z) - Regularizing Class-wise Predictions via Self-knowledge Distillation [80.76254453115766]
We propose a new regularization method that penalizes the predictive distribution between similar samples.
This results in regularizing the dark knowledge (i.e., the knowledge on wrong predictions) of a single network.
Our experimental results on various image classification tasks demonstrate that the simple yet powerful method can significantly improve the generalization ability.
arXiv Detail & Related papers (2020-03-31T06:03:51Z) - A game-theoretic approach for Generative Adversarial Networks [2.995087247817663]
Generative adversarial networks (GANs) are a class of generative models, known for producing accurate samples.
Main bottleneck for their implementation is that the neural networks are very hard to train.
We propose a relaxed forward-backward algorithm for GANs.
We prove that when the pseudogradient mapping of the game is monotone, we have convergence to an exact solution or in a neighbourhood of it.
arXiv Detail & Related papers (2020-03-30T17:14:41Z) - Hidden Cost of Randomized Smoothing [72.93630656906599]
In this paper, we point out the side effects of current randomized smoothing.
Specifically, we articulate and prove two major points: 1) the decision boundaries of smoothed classifiers will shrink, resulting in disparity in class-wise accuracy; 2) applying noise augmentation in the training process does not necessarily resolve the shrinking issue due to the inconsistent learning objectives.
arXiv Detail & Related papers (2020-03-02T23:37:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.