Related papers: On the Variance of Neural Network Training with respect to Test Sets and Distributions

On the Variance of Neural Network Training with respect to Test Sets and Distributions

URL: http://arxiv.org/abs/2304.01910v4
Date: Mon, 10 Jun 2024 02:25:33 GMT
Title: On the Variance of Neural Network Training with respect to Test Sets and Distributions
Authors: Keller Jordan,
Abstract summary: We show that standard CIFAR-10 and ImageNet trainings have little variance in performance on the underlying test-distributions. We prove that the variance of neural network trainings on their test-sets is a downstream consequence of the class-calibration property discovered by Jiang et al. Our analysis yields a simple formula which accurately predicts variance for the classification case.
Score: 1.994307489466967
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Typical neural network trainings have substantial variance in test-set performance between repeated runs, impeding hyperparameter comparison and training reproducibility. In this work we present the following results towards understanding this variation. (1) Despite having significant variance on their test-sets, we demonstrate that standard CIFAR-10 and ImageNet trainings have little variance in performance on the underlying test-distributions from which their test-sets are sampled. (2) We show that these trainings make approximately independent errors on their test-sets. That is, the event that a trained network makes an error on one particular example does not affect its chances of making errors on other examples, relative to their average rates over repeated runs of training with the same hyperparameters. (3) We prove that the variance of neural network trainings on their test-sets is a downstream consequence of the class-calibration property discovered by Jiang et al. (2021). Our analysis yields a simple formula which accurately predicts variance for the binary classification case. (4) We conduct preliminary studies of data augmentation, learning rate, finetuning instability and distribution-shift through the lens of variance between runs.

Related papers

Training Guarantees of Neural Network Classification Two-Sample Tests by Kernel Analysis [58.435336033383145]
We construct and analyze a neural network two-sample test to determine whether two datasets came from the same distribution. We derive the theoretical minimum training time needed to ensure the NTK two-sample test detects a deviation-level between the datasets. We show that the statistical power associated with the neural network two-sample test goes to 1 as the neural network training samples and test evaluation samples go to infinity.
arXiv Detail & Related papers (2024-07-05T18:41:16Z)
Robust Nonparametric Hypothesis Testing to Understand Variability in Training Neural Networks [5.8490454659691355]
We propose a new measure of closeness between classification models based on the output of the network before thresholding. Our measure is based on a robust hypothesis-testing framework and can be adapted to other quantities derived from trained models.
arXiv Detail & Related papers (2023-10-01T01:44:35Z)
Explicit Tradeoffs between Adversarial and Natural Distributional Robustness [48.44639585732391]
In practice, models need to enjoy both types of robustness to ensure reliability. In this work, we show that in fact, explicit tradeoffs exist between adversarial and natural distributional robustness.
arXiv Detail & Related papers (2022-09-15T19:58:01Z)
Conformal prediction for the design problem [72.14982816083297]
In many real-world deployments of machine learning, we use a prediction algorithm to choose what data to test next. In such settings, there is a distinct type of distribution shift between the training and test data. We introduce a method to quantify predictive uncertainty in such settings.
arXiv Detail & Related papers (2022-02-08T02:59:12Z)
Models of Computational Profiles to Study the Likelihood of DNN Metamorphic Test Cases [9.997379778870695]
We introduce "computational profiles" as vectors of neuron activation levels. We show that the distribution of computational profile likelihood for training and test cases are somehow similar. In contrast, metamorphic test cases show a prediction likelihood that lies in an extended range with respect to training, tests, and random noise.
arXiv Detail & Related papers (2021-07-28T16:57:44Z)
Assessing Generalization of SGD via Disagreement [71.17788927037081]
We empirically show that the test error of deep networks can be estimated by simply training the same architecture on the same training set but with a different run of Gradient Descent (SGD) This finding not only provides a simple empirical measure to directly predict the test error using unlabeled test data, but also establishes a new conceptual connection between generalization and calibration.
arXiv Detail & Related papers (2021-06-25T17:53:09Z)
Optimization Variance: Exploring Generalization Properties of DNNs [83.78477167211315]
The test error of a deep neural network (DNN) often demonstrates double descent. We propose a novel metric, optimization variance (OV), to measure the diversity of model updates.
arXiv Detail & Related papers (2021-06-03T09:34:17Z)
Bias-Aware Loss for Training Image and Speech Quality Prediction Models from Multiple Datasets [13.132388683797503]
We propose a bias-aware loss function that estimates each dataset's biases during training with a linear function. We prove the efficiency of the proposed method by training and validating quality prediction models on synthetic and subjective image and speech quality datasets.
arXiv Detail & Related papers (2021-04-20T19:20:11Z)
Unsupervised neural adaptation model based on optimal transport for spoken language identification [54.96267179988487]
Due to the mismatch of statistical distributions of acoustic speech between training and testing sets, the performance of spoken language identification (SLID) could be drastically degraded. We propose an unsupervised neural adaptation model to deal with the distribution mismatch problem for SLID.
arXiv Detail & Related papers (2020-12-24T07:37:19Z)
Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping [62.78338049381917]
Fine-tuning pretrained contextual word embedding models to supervised downstream tasks has become commonplace in natural language processing. We experiment with four datasets from the GLUE benchmark, fine-tuning BERT hundreds of times on each while varying only the random seeds. We find substantial performance increases compared to previously reported results, and we quantify how the performance of the best-found model varies as a function of the number of fine-tuning trials.
arXiv Detail & Related papers (2020-02-15T02:40:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.