Related papers: Enabling Calibration In The Zero-Shot Inference of Large Vision-Language Models

Enabling Calibration In The Zero-Shot Inference of Large Vision-Language Models

URL: http://arxiv.org/abs/2303.12748v4
Date: Tue, 18 Apr 2023 18:28:51 GMT
Title: Enabling Calibration In The Zero-Shot Inference of Large Vision-Language Models
Authors: Will LeVine, Benjamin Pikus, Pranav Raja, and Fernando Amat Gil
Abstract summary: We measure calibration across relevant variables like prompt, dataset, and architecture, and find that zero-shot inference with CLIP is miscalibrated. A single learned temperature generalizes for each specific CLIP model across inference dataset and prompt choice.
Score: 58.720142291102135
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Calibration of deep learning models is crucial to their trustworthiness and safe usage, and as such, has been extensively studied in supervised classification models, with methods crafted to decrease miscalibration. However, there has yet to be a comprehensive study of the calibration of vision-language models that are used for zero-shot inference, like CLIP. We measure calibration across relevant variables like prompt, dataset, and architecture, and find that zero-shot inference with CLIP is miscalibrated. Furthermore, we propose a modified version of temperature scaling that is aligned with the common use cases of CLIP as a zero-shot inference model, and show that a single learned temperature generalizes for each specific CLIP model (defined by a chosen pre-training dataset and architecture) across inference dataset and prompt choice.

Related papers

Robust Calibration of Large Vision-Language Adapters [17.583536041845402]
This paper addresses the critical issue of miscalibration in CLIP-based model adaptation. We empirically demonstrate that popular CLIP adaptation approaches, such as Adapters, Prompt Learning, and Test-Time Adaptation, substantially degrade the calibration capabilities of the zero-shot baseline. Motivated by these observations, we present a simple and model-agnostic solution to mitigate miscalibration, by scaling the logit range of each sample to its zero-shot prediction logits.
arXiv Detail & Related papers (2024-07-18T15:27:56Z)
Calibration of Continual Learning Models [18.547902778976084]
Continual Learning (CL) focuses on maximizing the predictive performance of a model across a non-stationary stream of data. Model calibration is an active research topic in machine learning, yet to be properly investigated in CL. We provide the first empirical study of the behavior of calibration approaches in CL, showing that CL strategies do not inherently learn calibrated models.
arXiv Detail & Related papers (2024-04-11T14:59:49Z)
Variable Importance Matching for Causal Inference [73.25504313552516]
We describe a general framework called Model-to-Match that achieves these goals. Model-to-Match uses variable importance measurements to construct a distance metric. We operationalize the Model-to-Match framework with LASSO.
arXiv Detail & Related papers (2023-02-23T00:43:03Z)
CLIPood: Generalizing CLIP to Out-of-Distributions [73.86353105017076]
Contrastive language-image pre-training (CLIP) models have shown impressive zero-shot ability, but the further adaptation of CLIP on downstream tasks undesirably degrades OOD performances. We propose CLIPood, a fine-tuning method that can adapt CLIP models to OOD situations where both domain shifts and open classes may occur on unseen test data. Experiments on diverse datasets with different OOD scenarios show that CLIPood consistently outperforms existing generalization techniques.
arXiv Detail & Related papers (2023-02-02T04:27:54Z)
On Calibrating Semantic Segmentation Models: Analyses and An Algorithm [51.85289816613351]
We study the problem of semantic segmentation calibration. Model capacity, crop size, multi-scale testing, and prediction correctness have impact on calibration. We propose a simple, unifying, and effective approach, namely selective scaling.
arXiv Detail & Related papers (2022-12-22T22:05:16Z)
On the calibration of underrepresented classes in LiDAR-based semantic segmentation [7.100396757261104]
This work focuses on a class-wise evaluation of several models' confidence performance for LiDAR-based semantic segmentation. We compare the calibration abilities of three semantic segmentation models with different architectural concepts. By identifying and describing the dependency between the predictive performance of a class and the respective calibration quality we aim to facilitate the model selection and refinement for safety-critical applications.
arXiv Detail & Related papers (2022-10-13T07:49:24Z)
Variable-Based Calibration for Machine Learning Classifiers [11.9995808096481]
We introduce the notion of variable-based calibration to characterize calibration properties of a model. We find that models with near-perfect expected calibration error can exhibit significant miscalibration as a function of features of the data.
arXiv Detail & Related papers (2022-09-30T00:49:31Z)
Modular Conformal Calibration [80.33410096908872]
We introduce a versatile class of algorithms for recalibration in regression. This framework allows one to transform any regression model into a calibrated probabilistic model. We conduct an empirical study of MCC on 17 regression datasets.
arXiv Detail & Related papers (2022-06-23T03:25:23Z)
A Gating Model for Bias Calibration in Generalized Zero-shot Learning [18.32369721322249]
Generalized zero-shot learning (GZSL) aims at training a model that can generalize to unseen class data by only using auxiliary information. One of the main challenges in GZSL is a biased model prediction toward seen classes caused by overfitting on only available seen class data during training. We propose a two-stream autoencoder-based gating model for GZSL.
arXiv Detail & Related papers (2022-03-08T16:41:06Z)
On Model Calibration for Long-Tailed Object Detection and Instance Segmentation [56.82077636126353]
We propose NorCal, Normalized for long-tailed object detection and instance segmentation. We show that separately handling the background class and normalizing the scores over classes for each proposal are keys to achieving superior performance.
arXiv Detail & Related papers (2021-07-05T17:57:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.