Annealing Double-Head: An Architecture for Online Calibration of Deep
Neural Networks
- URL: http://arxiv.org/abs/2212.13621v1
- Date: Tue, 27 Dec 2022 21:21:58 GMT
- Title: Annealing Double-Head: An Architecture for Online Calibration of Deep
Neural Networks
- Authors: Erdong Guo, David Draper and Maria De Iorio
- Abstract summary: Modern deep neural networks are generally poorly calibrated due to the overestimation of predictive confidence.
We propose Annealing Double-Head, a simple-to-implement but highly effective architecture for calibrating the DNN during training.
We demonstrate that our method achieves state-of-the-art model calibration performance without post-processing.
- Score: 1.1602089225841632
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model calibration, which is concerned with how frequently the model predicts
correctly, not only plays a vital part in statistical model design, but also
has substantial practical applications, such as optimal decision-making in the
real world. However, it has been discovered that modern deep neural networks
are generally poorly calibrated due to the overestimation (or underestimation)
of predictive confidence, which is closely related to overfitting. In this
paper, we propose Annealing Double-Head, a simple-to-implement but highly
effective architecture for calibrating the DNN during training. To be precise,
we construct an additional calibration head-a shallow neural network that
typically has one latent layer-on top of the last latent layer in the normal
model to map the logits to the aligned confidence. Furthermore, a simple
Annealing technique that dynamically scales the logits by calibration head in
training procedure is developed to improve its performance. Under both the
in-distribution and distributional shift circumstances, we exhaustively
evaluate our Annealing Double-Head architecture on multiple pairs of
contemporary DNN architectures and vision and speech datasets. We demonstrate
that our method achieves state-of-the-art model calibration performance without
post-processing while simultaneously providing comparable predictive accuracy
in comparison to other recently proposed calibration methods on a range of
learning tasks.
Related papers
- Decoupling Feature Extraction and Classification Layers for Calibrated Neural Networks [3.5284544394841117]
We show that decoupling the training of feature extraction layers and classification layers in over-parametrized DNN architectures significantly improves model calibration.
We illustrate these methods improve calibration across ViT and WRN architectures for several image classification benchmark datasets.
arXiv Detail & Related papers (2024-05-02T11:36:17Z) - Towards Calibrated Robust Fine-Tuning of Vision-Language Models [97.19901765814431]
This work proposes a robust fine-tuning method that improves both OOD accuracy and confidence calibration simultaneously in vision language models.
We show that both OOD classification and OOD calibration errors have a shared upper bound consisting of two terms of ID data.
Based on this insight, we design a novel framework that conducts fine-tuning with a constrained multimodal contrastive loss enforcing a larger smallest singular value.
arXiv Detail & Related papers (2023-11-03T05:41:25Z) - HCE: Improving Performance and Efficiency with Heterogeneously
Compressed Neural Network Ensemble [22.065904428696353]
Recent ensemble training method explores different training algorithms or settings on multiple sub-models with the same model architecture.
We propose Heterogeneously Compressed Ensemble (HCE), where we build an efficient ensemble with the pruned and quantized variants from a pretrained DNN model.
arXiv Detail & Related papers (2023-01-18T21:47:05Z) - NCTV: Neural Clamping Toolkit and Visualization for Neural Network
Calibration [66.22668336495175]
A lack of consideration for neural network calibration will not gain trust from humans.
We introduce the Neural Clamping Toolkit, the first open-source framework designed to help developers employ state-of-the-art model-agnostic calibrated models.
arXiv Detail & Related papers (2022-11-29T15:03:05Z) - An Adaptive and Stability-Promoting Layerwise Training Approach for Sparse Deep Neural Network Architecture [0.0]
This work presents a two-stage adaptive framework for developing deep neural network (DNN) architectures that generalize well for a given training data set.
In the first stage, a layerwise training approach is adopted where a new layer is added each time and trained independently by freezing parameters in the previous layers.
We introduce a epsilon-delta stability-promoting concept as a desirable property for a learning algorithm and show that employing manifold regularization yields a epsilon-delta stability-promoting algorithm.
arXiv Detail & Related papers (2022-11-13T09:51:16Z) - Meta-Calibration: Learning of Model Calibration Using Differentiable
Expected Calibration Error [46.12703434199988]
We introduce a new differentiable surrogate for expected calibration error (DECE) that allows calibration quality to be directly optimised.
We also propose a meta-learning framework that uses DECE to optimise for validation set calibration.
arXiv Detail & Related papers (2021-06-17T15:47:50Z) - On the Dark Side of Calibration for Modern Neural Networks [65.83956184145477]
We show the breakdown of expected calibration error (ECE) into predicted confidence and refinement.
We highlight that regularisation based calibration only focuses on naively reducing a model's confidence.
We find that many calibration approaches with the likes of label smoothing, mixup etc. lower the utility of a DNN by degrading its refinement.
arXiv Detail & Related papers (2021-06-17T11:04:14Z) - DAIS: Automatic Channel Pruning via Differentiable Annealing Indicator
Search [55.164053971213576]
convolutional neural network has achieved great success in fulfilling computer vision tasks despite large computation overhead.
Structured (channel) pruning is usually applied to reduce the model redundancy while preserving the network structure.
Existing structured pruning methods require hand-crafted rules which may lead to tremendous pruning space.
arXiv Detail & Related papers (2020-11-04T07:43:01Z) - Intra Order-preserving Functions for Calibration of Multi-Class Neural
Networks [54.23874144090228]
A common approach is to learn a post-hoc calibration function that transforms the output of the original network into calibrated confidence scores.
Previous post-hoc calibration techniques work only with simple calibration functions.
We propose a new neural network architecture that represents a class of intra order-preserving functions.
arXiv Detail & Related papers (2020-03-15T12:57:21Z) - Calibrating Deep Neural Networks using Focal Loss [77.92765139898906]
Miscalibration is a mismatch between a model's confidence and its correctness.
We show that focal loss allows us to learn models that are already very well calibrated.
We show that our approach achieves state-of-the-art calibration without compromising on accuracy in almost all cases.
arXiv Detail & Related papers (2020-02-21T17:35:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.