Related papers: Decoupling Feature Extraction and Classification Layers for Calibrated Neural Networks

Decoupling Feature Extraction and Classification Layers for Calibrated Neural Networks

URL: http://arxiv.org/abs/2405.01196v3
Date: Mon, 6 May 2024 08:19:20 GMT
Title: Decoupling Feature Extraction and Classification Layers for Calibrated Neural Networks
Authors: Mikkel Jordahn, Pablo M. Olmos,
Abstract summary: We show that decoupling the training of feature extraction layers and classification layers in over-parametrized DNN architectures significantly improves model calibration. We illustrate these methods improve calibration across ViT and WRN architectures for several image classification benchmark datasets.
Score: 3.5284544394841117
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep Neural Networks (DNN) have shown great promise in many classification applications, yet are widely known to have poorly calibrated predictions when they are over-parametrized. Improving DNN calibration without comprising on model accuracy is of extreme importance and interest in safety critical applications such as in the health-care sector. In this work, we show that decoupling the training of feature extraction layers and classification layers in over-parametrized DNN architectures such as Wide Residual Networks (WRN) and Visual Transformers (ViT) significantly improves model calibration whilst retaining accuracy, and at a low training cost. In addition, we show that placing a Gaussian prior on the last hidden layer outputs of a DNN, and training the model variationally in the classification training stage, even further improves calibration. We illustrate these methods improve calibration across ViT and WRN architectures for several image classification benchmark datasets.

Related papers

GETS: Ensemble Temperature Scaling for Calibration in Graph Neural Networks [8.505932176266368]
Graph Neural Networks deliver strong classification results but often suffer from poor calibration performance, leading to overconfidence or underconfidence. Existing post hoc methods, such as temperature scaling, fail to effectively utilize graph structures, while current GNN calibration methods often overlook the potential of leveraging diverse input information and model ensembles jointly. In the paper, we propose Graph Ensemble TemperatureScaling, a novel calibration framework that combines input and model ensemble strategies within a Graph Mixture of Experts archi SOTA calibration techniques, reducing expected calibration error by 25 percent across 10 GNN benchmark datasets.
arXiv Detail & Related papers (2024-10-12T15:34:41Z)
Cal-DETR: Calibrated Detection Transformer [67.75361289429013]
We propose a mechanism for calibrated detection transformers (Cal-DETR), particularly for Deformable-DETR, UP-DETR and DINO. We develop an uncertainty-guided logit modulation mechanism that leverages the uncertainty to modulate the class logits. Results corroborate the effectiveness of Cal-DETR against the competing train-time methods in calibrating both in-domain and out-domain detections.
arXiv Detail & Related papers (2023-11-06T22:13:10Z)
Bridging Precision and Confidence: A Train-Time Loss for Calibrating Object Detection [58.789823426981044]
We propose a novel auxiliary loss formulation that aims to align the class confidence of bounding boxes with the accurateness of predictions. Our results reveal that our train-time loss surpasses strong calibration baselines in reducing calibration error for both in and out-domain scenarios.
arXiv Detail & Related papers (2023-03-25T08:56:21Z)
ESD: Expected Squared Difference as a Tuning-Free Trainable Calibration Measure [35.996971010199196]
Expected Squared Difference ( ESD) is a tuning-free trainable calibration objective loss. We show that ESD yields the best-calibrated results compared with previous approaches. ESD drastically improves the computational costs required for calibration during training.
arXiv Detail & Related papers (2023-03-04T18:06:36Z)
Annealing Double-Head: An Architecture for Online Calibration of Deep Neural Networks [1.1602089225841632]
Modern deep neural networks are generally poorly calibrated due to the overestimation of predictive confidence. We propose Annealing Double-Head, a simple-to-implement but highly effective architecture for calibrating the DNN during training. We demonstrate that our method achieves state-of-the-art model calibration performance without post-processing.
arXiv Detail & Related papers (2022-12-27T21:21:58Z)
NCTV: Neural Clamping Toolkit and Visualization for Neural Network Calibration [66.22668336495175]
A lack of consideration for neural network calibration will not gain trust from humans. We introduce the Neural Clamping Toolkit, the first open-source framework designed to help developers employ state-of-the-art model-agnostic calibrated models.
arXiv Detail & Related papers (2022-11-29T15:03:05Z)
What Makes Graph Neural Networks Miscalibrated? [48.00374886504513]
We conduct a systematic study on the calibration qualities of graph neural networks (GNNs) We identify five factors which influence the calibration of GNNs: general under-confident tendency, diversity of nodewise predictive distributions, distance to training nodes, relative confidence level, and neighborhood similarity. We design a novel calibration method named Graph Attention Temperature Scaling (GATS), which is tailored for calibrating graph neural networks.
arXiv Detail & Related papers (2022-10-12T16:41:42Z)
On Calibration of Graph Neural Networks for Node Classification [29.738179864433445]
Graph neural networks learn entity and edge embeddings for tasks such as node classification and link prediction. These models achieve good performance with respect to accuracy, but the confidence scores associated with the predictions might not be calibrated. We propose a topology-aware calibration method that takes the neighboring nodes into account and yields improved calibration.
arXiv Detail & Related papers (2022-06-03T13:48:10Z)
KNN-BERT: Fine-Tuning Pre-Trained Models with KNN Classifier [61.063988689601416]
Pre-trained models are widely used in fine-tuning downstream tasks with linear classifiers optimized by the cross-entropy loss. These problems can be improved by learning representations that focus on similarities in the same class and contradictions when making predictions. We introduce the KNearest Neighbors in pre-trained model fine-tuning tasks in this paper.
arXiv Detail & Related papers (2021-10-06T06:17:05Z)
On the Dark Side of Calibration for Modern Neural Networks [65.83956184145477]
We show the breakdown of expected calibration error (ECE) into predicted confidence and refinement. We highlight that regularisation based calibration only focuses on naively reducing a model's confidence. We find that many calibration approaches with the likes of label smoothing, mixup etc. lower the utility of a DNN by degrading its refinement.
arXiv Detail & Related papers (2021-06-17T11:04:14Z)
Improved Trainable Calibration Method for Neural Networks on Medical Imaging Classification [17.941506832422192]
Empirically, neural networks are often miscalibrated and overconfident in their predictions. We propose a novel calibration approach that maintains the overall classification accuracy while significantly improving model calibration.
arXiv Detail & Related papers (2020-09-09T01:25:53Z)
On Calibration of Mixup Training for Deep Neural Networks [1.6242924916178283]
We argue and provide empirical evidence that, due to its fundamentals, Mixup does not necessarily improve calibration. Our loss is inspired by Bayes decision theory and introduces a new training framework for designing losses for probabilistic modelling. We provide state-of-the-art accuracy with consistent improvements in calibration performance.
arXiv Detail & Related papers (2020-03-22T16:54:31Z)
Intra Order-preserving Functions for Calibration of Multi-Class Neural Networks [54.23874144090228]
A common approach is to learn a post-hoc calibration function that transforms the output of the original network into calibrated confidence scores. Previous post-hoc calibration techniques work only with simple calibration functions. We propose a new neural network architecture that represents a class of intra order-preserving functions.
arXiv Detail & Related papers (2020-03-15T12:57:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.