Calibrating the Rigged Lottery: Making All Tickets Reliable
- URL: http://arxiv.org/abs/2302.09369v1
- Date: Sat, 18 Feb 2023 15:53:55 GMT
- Title: Calibrating the Rigged Lottery: Making All Tickets Reliable
- Authors: Bowen Lei, Ruqi Zhang, Dongkuan Xu, Bani Mallick
- Abstract summary: We propose a new sparse training method to produce sparse models with improved confidence calibration.
Our method simultaneously maintains or even improves accuracy with only a slight increase in computation and storage burden.
- Score: 14.353428281239665
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although sparse training has been successfully used in various
resource-limited deep learning tasks to save memory, accelerate training, and
reduce inference time, the reliability of the produced sparse models remains
unexplored. Previous research has shown that deep neural networks tend to be
over-confident, and we find that sparse training exacerbates this problem.
Therefore, calibrating the sparse models is crucial for reliable prediction and
decision-making. In this paper, we propose a new sparse training method to
produce sparse models with improved confidence calibration. In contrast to
previous research that uses only one mask to control the sparse topology, our
method utilizes two masks, including a deterministic mask and a random mask.
The former efficiently searches and activates important weights by exploiting
the magnitude of weights and gradients. While the latter brings better
exploration and finds more appropriate weight values by random updates.
Theoretically, we prove our method can be viewed as a hierarchical variational
approximation of a probabilistic deep Gaussian process. Extensive experiments
on multiple datasets, model architectures, and sparsities show that our method
reduces ECE values by up to 47.8\% and simultaneously maintains or even
improves accuracy with only a slight increase in computation and storage
burden.
Related papers
- Open-Set Deepfake Detection: A Parameter-Efficient Adaptation Method with Forgery Style Mixture [58.60915132222421]
We introduce an approach that is both general and parameter-efficient for face forgery detection.
We design a forgery-style mixture formulation that augments the diversity of forgery source domains.
We show that the designed model achieves state-of-the-art generalizability with significantly reduced trainable parameters.
arXiv Detail & Related papers (2024-08-23T01:53:36Z) - KAKURENBO: Adaptively Hiding Samples in Deep Neural Network Training [2.8804804517897935]
We propose a method for hiding the least-important samples during the training of deep neural networks.
We adaptively find samples to exclude in a given epoch based on their contribution to the overall learning process.
Our method can reduce total training time by up to 22% impacting accuracy only by 0.4% compared to the baseline.
arXiv Detail & Related papers (2023-10-16T06:19:29Z) - HyperSparse Neural Networks: Shifting Exploration to Exploitation
through Adaptive Regularization [18.786142528591355]
Sparse neural networks are a key factor in developing resource-efficient machine learning applications.
We propose the novel and powerful sparse learning method Adaptive Regularized Training (ART) to compress dense into sparse networks.
Our method compresses the pre-trained model knowledge into the weights of highest magnitude.
arXiv Detail & Related papers (2023-08-14T14:18:11Z) - Multi-Head Multi-Loss Model Calibration [13.841172927454204]
We introduce a form of simplified ensembling that bypasses the costly training and inference of deep ensembles.
Specifically, each head is trained to minimize a weighted Cross-Entropy loss, but the weights are different among the different branches.
We show that the resulting averaged predictions can achieve excellent calibration without sacrificing accuracy in two challenging datasets.
arXiv Detail & Related papers (2023-03-02T09:32:32Z) - Powerpropagation: A sparsity inducing weight reparameterisation [65.85142037667065]
We introduce Powerpropagation, a new weight- parameterisation for neural networks that leads to inherently sparse models.
Models trained in this manner exhibit similar performance, but have a distribution with markedly higher density at zero, allowing more parameters to be pruned safely.
Here, we combine Powerpropagation with a traditional weight-pruning technique as well as recent state-of-the-art sparse-to-sparse algorithms, showing superior performance on the ImageNet benchmark.
arXiv Detail & Related papers (2021-10-01T10:03:57Z) - Efficient remedies for outlier detection with variational autoencoders [8.80692072928023]
Likelihoods computed by deep generative models are a candidate metric for outlier detection with unlabeled data.
We show that a theoretically-grounded correction readily ameliorates a key bias with VAE likelihood estimates.
We also show that the variance of the likelihoods computed over an ensemble of VAEs also enables robust outlier detection.
arXiv Detail & Related papers (2021-08-19T16:00:58Z) - Effective Model Sparsification by Scheduled Grow-and-Prune Methods [73.03533268740605]
We propose a novel scheduled grow-and-prune (GaP) methodology without pre-training the dense models.
Experiments have shown that such models can match or beat the quality of highly optimized dense models at 80% sparsity on a variety of tasks.
arXiv Detail & Related papers (2021-06-18T01:03:13Z) - Scalable Marginal Likelihood Estimation for Model Selection in Deep
Learning [78.83598532168256]
Marginal-likelihood based model-selection is rarely used in deep learning due to estimation difficulties.
Our work shows that marginal likelihoods can improve generalization and be useful when validation data is unavailable.
arXiv Detail & Related papers (2021-04-11T09:50:24Z) - Uncertainty Estimation Using a Single Deep Deterministic Neural Network [66.26231423824089]
We propose a method for training a deterministic deep model that can find and reject out of distribution data points at test time with a single forward pass.
We scale training in these with a novel loss function and centroid updating scheme and match the accuracy of softmax models.
arXiv Detail & Related papers (2020-03-04T12:27:36Z) - Hidden Cost of Randomized Smoothing [72.93630656906599]
In this paper, we point out the side effects of current randomized smoothing.
Specifically, we articulate and prove two major points: 1) the decision boundaries of smoothed classifiers will shrink, resulting in disparity in class-wise accuracy; 2) applying noise augmentation in the training process does not necessarily resolve the shrinking issue due to the inconsistent learning objectives.
arXiv Detail & Related papers (2020-03-02T23:37:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.