Related papers: Annealing Double-Head: An Architecture for Online Calibration of Deep Neural Networks

Annealing Double-Head: An Architecture for Online Calibration of Deep Neural Networks

URL: http://arxiv.org/abs/2212.13621v1
Date: Tue, 27 Dec 2022 21:21:58 GMT
Title: Annealing Double-Head: An Architecture for Online Calibration of Deep Neural Networks
Authors: Erdong Guo, David Draper and Maria De Iorio
Abstract summary: Modern deep neural networks are generally poorly calibrated due to the overestimation of predictive confidence. We propose Annealing Double-Head, a simple-to-implement but highly effective architecture for calibrating the DNN during training. We demonstrate that our method achieves state-of-the-art model calibration performance without post-processing.
Score: 1.1602089225841632
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Model calibration, which is concerned with how frequently the model predicts correctly, not only plays a vital part in statistical model design, but also has substantial practical applications, such as optimal decision-making in the real world. However, it has been discovered that modern deep neural networks are generally poorly calibrated due to the overestimation (or underestimation) of predictive confidence, which is closely related to overfitting. In this paper, we propose Annealing Double-Head, a simple-to-implement but highly effective architecture for calibrating the DNN during training. To be precise, we construct an additional calibration head-a shallow neural network that typically has one latent layer-on top of the last latent layer in the normal model to map the logits to the aligned confidence. Furthermore, a simple Annealing technique that dynamically scales the logits by calibration head in training procedure is developed to improve its performance. Under both the in-distribution and distributional shift circumstances, we exhaustively evaluate our Annealing Double-Head architecture on multiple pairs of contemporary DNN architectures and vision and speech datasets. We demonstrate that our method achieves state-of-the-art model calibration performance without post-processing while simultaneously providing comparable predictive accuracy in comparison to other recently proposed calibration methods on a range of learning tasks.

Related papers

Balancing Two Classifiers via A Simplex ETF Structure for Model Calibration [34.52946891778497]
Deep neural networks (DNNs) have demonstrated state-of-the-art performance across various domains. They often face calibration issues, particularly in safety-critical applications such as autonomous driving and healthcare. Recent research has started to improve model calibration from the view of the classifier.
arXiv Detail & Related papers (2025-04-14T09:09:01Z)
ConsistentFeature: A Plug-and-Play Component for Neural Network Regularization [0.32885740436059047]
Over- parameterized neural network models often lead to significant performance discrepancies between training and test sets. We introduce a simple perspective on overfitting: models learn different representations in different i.i.d. datasets. We propose an adaptive method, ConsistentFeature, that regularizes the model by constraining feature differences across random subsets of the same training set.
arXiv Detail & Related papers (2024-12-02T13:21:31Z)
Decoupling Feature Extraction and Classification Layers for Calibrated Neural Networks [3.5284544394841117]
We show that decoupling the training of feature extraction layers and classification layers in over-parametrized DNN architectures significantly improves model calibration. We illustrate these methods improve calibration across ViT and WRN architectures for several image classification benchmark datasets.
arXiv Detail & Related papers (2024-05-02T11:36:17Z)
Towards Calibrated Deep Clustering Network [60.71776081164377]
In deep clustering, the estimated confidence for a sample belonging to a particular cluster greatly exceeds its actual prediction accuracy. We propose a novel dual head (calibration head and clustering head) deep clustering model that can effectively calibrate the estimated confidence and the actual accuracy. The proposed calibrated deep clustering model not only surpasses the state-of-the-art deep clustering methods by 5x on average in terms of expected calibration error, but also significantly outperforms them in terms of clustering accuracy.
arXiv Detail & Related papers (2024-03-04T11:23:40Z)
Towards Calibrated Robust Fine-Tuning of Vision-Language Models [97.19901765814431]
This work proposes a robust fine-tuning method that improves both OOD accuracy and confidence calibration simultaneously in vision language models. We show that both OOD classification and OOD calibration errors have a shared upper bound consisting of two terms of ID data. Based on this insight, we design a novel framework that conducts fine-tuning with a constrained multimodal contrastive loss enforcing a larger smallest singular value.
arXiv Detail & Related papers (2023-11-03T05:41:25Z)
HCE: Improving Performance and Efficiency with Heterogeneously Compressed Neural Network Ensemble [22.065904428696353]
Recent ensemble training method explores different training algorithms or settings on multiple sub-models with the same model architecture. We propose Heterogeneously Compressed Ensemble (HCE), where we build an efficient ensemble with the pruned and quantized variants from a pretrained DNN model.
arXiv Detail & Related papers (2023-01-18T21:47:05Z)
NCTV: Neural Clamping Toolkit and Visualization for Neural Network Calibration [66.22668336495175]
A lack of consideration for neural network calibration will not gain trust from humans. We introduce the Neural Clamping Toolkit, the first open-source framework designed to help developers employ state-of-the-art model-agnostic calibrated models.
arXiv Detail & Related papers (2022-11-29T15:03:05Z)
An Adaptive and Stability-Promoting Layerwise Training Approach for Sparse Deep Neural Network Architecture [0.0]
This work presents a two-stage adaptive framework for developing deep neural network (DNN) architectures that generalize well for a given training data set. In the first stage, a layerwise training approach is adopted where a new layer is added each time and trained independently by freezing parameters in the previous layers. We introduce a epsilon-delta stability-promoting concept as a desirable property for a learning algorithm and show that employing manifold regularization yields a epsilon-delta stability-promoting algorithm.
arXiv Detail & Related papers (2022-11-13T09:51:16Z)
Meta-Calibration: Learning of Model Calibration Using Differentiable Expected Calibration Error [46.12703434199988]
We introduce a new differentiable surrogate for expected calibration error (DECE) that allows calibration quality to be directly optimised. We also propose a meta-learning framework that uses DECE to optimise for validation set calibration.
arXiv Detail & Related papers (2021-06-17T15:47:50Z)
On the Dark Side of Calibration for Modern Neural Networks [65.83956184145477]
We show the breakdown of expected calibration error (ECE) into predicted confidence and refinement. We highlight that regularisation based calibration only focuses on naively reducing a model's confidence. We find that many calibration approaches with the likes of label smoothing, mixup etc. lower the utility of a DNN by degrading its refinement.
arXiv Detail & Related papers (2021-06-17T11:04:14Z)
DAIS: Automatic Channel Pruning via Differentiable Annealing Indicator Search [55.164053971213576]
convolutional neural network has achieved great success in fulfilling computer vision tasks despite large computation overhead. Structured (channel) pruning is usually applied to reduce the model redundancy while preserving the network structure. Existing structured pruning methods require hand-crafted rules which may lead to tremendous pruning space.
arXiv Detail & Related papers (2020-11-04T07:43:01Z)
Intra Order-preserving Functions for Calibration of Multi-Class Neural Networks [54.23874144090228]
A common approach is to learn a post-hoc calibration function that transforms the output of the original network into calibrated confidence scores. Previous post-hoc calibration techniques work only with simple calibration functions. We propose a new neural network architecture that represents a class of intra order-preserving functions.
arXiv Detail & Related papers (2020-03-15T12:57:21Z)
Calibrating Deep Neural Networks using Focal Loss [77.92765139898906]
Miscalibration is a mismatch between a model's confidence and its correctness. We show that focal loss allows us to learn models that are already very well calibrated. We show that our approach achieves state-of-the-art calibration without compromising on accuracy in almost all cases.
arXiv Detail & Related papers (2020-02-21T17:35:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.