Related papers: Localized Calibration: Metrics and Recalibration

Localized Calibration: Metrics and Recalibration

URL: http://arxiv.org/abs/2102.10809v1
Date: Mon, 22 Feb 2021 07:22:12 GMT
Title: Localized Calibration: Metrics and Recalibration
Authors: Rachel Luo, Aadyot Bhatnagar, Huan Wang, Caiming Xiong, Silvio Savarese, Yu Bai, Shengjia Zhao, Stefano Ermon
Abstract summary: We propose a fine-grained calibration metric that spans the gap between fully global and fully individualized calibration. We then introduce a localized recalibration method, LoRe, that improves the LCE better than existing recalibration methods.
Score: 133.07044916594361
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Probabilistic classifiers output confidence scores along with their predictions, and these confidence scores must be well-calibrated (i.e. reflect the true probability of an event) to be meaningful and useful for downstream tasks. However, existing metrics for measuring calibration are insufficient. Commonly used metrics such as the expected calibration error (ECE) only measure global trends, making them ineffective for measuring the calibration of a particular sample or subgroup. At the other end of the spectrum, a fully individualized calibration error is in general intractable to estimate from finite samples. In this work, we propose the local calibration error (LCE), a fine-grained calibration metric that spans the gap between fully global and fully individualized calibration. The LCE leverages learned features to automatically capture rich subgroups, and it measures the calibration error around each individual example via a similarity function. We then introduce a localized recalibration method, LoRe, that improves the LCE better than existing recalibration methods. Finally, we show that applying our recalibration method improves decision-making on downstream tasks.

Related papers

Rethinking Early Stopping: Refine, Then Calibrate [49.966899634962374]
We present a novel variational formulation of the calibration-refinement decomposition.<n>We provide theoretical and empirical evidence that calibration and refinement errors are not minimized simultaneously during training.
arXiv Detail & Related papers (2025-01-31T15:03:54Z)
Understanding Model Calibration -- A gentle introduction and visual exploration of calibration and the expected calibration error (ECE) [0.0]
In this blogpost we'll take a look at the most commonly used definition for calibration. We'll then cover some of the drawbacks of this measure and how these surfaced the need for additional notions of calibration.
arXiv Detail & Related papers (2025-01-31T11:18:45Z)
Combining Priors with Experience: Confidence Calibration Based on Binomial Process Modeling [3.4580564656984736]
Existing confidence calibration methods mostly use statistical techniques to estimate the calibration curve from data. A new calibration metric ($TCE_bpm$), which leverages the estimated calibration curve to estimate the true calibration error (TCE), is designed. The effectiveness of our calibration method and metric are verified in real-world and simulated data.
arXiv Detail & Related papers (2024-12-14T03:04:05Z)
Orthogonal Causal Calibration [55.28164682911196]
We prove generic upper bounds on the calibration error of any causal parameter estimate $theta$ with respect to any loss $ell$. We use our bound to analyze the convergence of two sample splitting algorithms for causal calibration.
arXiv Detail & Related papers (2024-06-04T03:35:25Z)
FedCal: Achieving Local and Global Calibration in Federated Learning via Aggregated Parameterized Scaler [29.93307421620845]
Federated learning (FedCal) uses client-specific scalers for local and global calibration. Experiments demonstrate FedCal significantly outperforms the best-performing baseline, reducing global calibration error by 47.66% on average.
arXiv Detail & Related papers (2024-05-24T11:33:58Z)
Optimizing Calibration by Gaining Aware of Prediction Correctness [30.619608580138802]
Cross-Entropy (CE) loss is widely used for calibrator training, which enforces the model to increase confidence on the ground truth class. We propose a new post-hoc calibration objective derived from the aim of calibration.
arXiv Detail & Related papers (2024-04-19T17:25:43Z)
Calibration by Distribution Matching: Trainable Kernel Calibration Metrics [56.629245030893685]
We introduce kernel-based calibration metrics that unify and generalize popular forms of calibration for both classification and regression. These metrics admit differentiable sample estimates, making it easy to incorporate a calibration objective into empirical risk minimization. We provide intuitive mechanisms to tailor calibration metrics to a decision task, and enforce accurate loss estimation and no regret decisions.
arXiv Detail & Related papers (2023-10-31T06:19:40Z)
Revisiting Calibration for Question Answering [16.54743762235555]
We argue that the traditional evaluation of calibration does not reflect usefulness of the model confidence. We propose a new calibration metric, MacroCE, that better captures whether the model assigns low confidence to wrong predictions and high confidence to correct predictions.
arXiv Detail & Related papers (2022-05-25T05:49:56Z)
Hidden Heterogeneity: When to Choose Similarity-Based Calibration [12.788224825185633]
Black-box calibration methods are unable to detect subpopulations where calibration could improve prediction accuracy. The paper proposes a quantitative measure for hidden heterogeneity (HH) Experiments show that the improvements in calibration achieved by similarity-based calibration methods correlate with the amount of HH present and, given sufficient calibration data, generally exceed calibration achieved by global methods.
arXiv Detail & Related papers (2022-02-03T20:43:25Z)
Uncertainty Quantification and Deep Ensembles [79.4957965474334]
We show that deep-ensembles do not necessarily lead to improved calibration properties. We show that standard ensembling methods, when used in conjunction with modern techniques such as mixup regularization, can lead to less calibrated models. This text examines the interplay between three of the most simple and commonly used approaches to leverage deep learning when data is scarce.
arXiv Detail & Related papers (2020-07-17T07:32:24Z)
Unsupervised Calibration under Covariate Shift [92.02278658443166]
We introduce the problem of calibration under domain shift and propose an importance sampling based approach to address it. We evaluate and discuss the efficacy of our method on both real-world datasets and synthetic datasets.
arXiv Detail & Related papers (2020-06-29T21:50:07Z)
Calibration of Neural Networks using Splines [51.42640515410253]
Measuring calibration error amounts to comparing two empirical distributions. We introduce a binning-free calibration measure inspired by the classical Kolmogorov-Smirnov (KS) statistical test. Our method consistently outperforms existing methods on KS error as well as other commonly used calibration measures.
arXiv Detail & Related papers (2020-06-23T07:18:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.