Towards Calibrated Robust Fine-Tuning of Vision-Language Models
- URL: http://arxiv.org/abs/2311.01723v7
- Date: Thu, 07 Nov 2024 06:25:25 GMT
- Title: Towards Calibrated Robust Fine-Tuning of Vision-Language Models
- Authors: Changdae Oh, Hyesu Lim, Mijoo Kim, Dongyoon Han, Sangdoo Yun, Jaegul Choo, Alexander Hauptmann, Zhi-Qi Cheng, Kyungwoo Song,
- Abstract summary: This work proposes a robust fine-tuning method that improves both OOD accuracy and confidence calibration simultaneously in vision language models.
We show that both OOD classification and OOD calibration errors have a shared upper bound consisting of two terms of ID data.
Based on this insight, we design a novel framework that conducts fine-tuning with a constrained multimodal contrastive loss enforcing a larger smallest singular value.
- Score: 97.19901765814431
- License:
- Abstract: Improving out-of-distribution (OOD) generalization during in-distribution (ID) adaptation is a primary goal of robust fine-tuning of zero-shot models beyond naive fine-tuning. However, despite decent OOD generalization performance from recent robust fine-tuning methods, confidence calibration for reliable model output has not been fully addressed. This work proposes a robust fine-tuning method that improves both OOD accuracy and confidence calibration simultaneously in vision language models. Firstly, we show that both OOD classification and OOD calibration errors have a shared upper bound consisting of two terms of ID data: 1) ID calibration error and 2) the smallest singular value of the ID input covariance matrix. Based on this insight, we design a novel framework that conducts fine-tuning with a constrained multimodal contrastive loss enforcing a larger smallest singular value, which is further guided by the self-distillation of a moving-averaged model to achieve calibrated prediction as well. Starting from empirical evidence supporting our theoretical statements, we provide extensive experimental results on ImageNet distribution shift benchmarks that demonstrate the effectiveness of our theorem and its practical implementation.
Related papers
- Robust Fine-tuning of Zero-shot Models via Variance Reduction [56.360865951192324]
When fine-tuning zero-shot models, our desideratum is for the fine-tuned model to excel in both in-distribution (ID) and out-of-distribution (OOD)
We propose a sample-wise ensembling technique that can simultaneously attain the best ID and OOD accuracy without the trade-offs.
arXiv Detail & Related papers (2024-11-11T13:13:39Z) - Model Reprogramming Outperforms Fine-tuning on Out-of-distribution Data in Text-Image Encoders [56.47577824219207]
In this paper, we unveil the hidden costs associated with intrusive fine-tuning techniques.
We introduce a new model reprogramming approach for fine-tuning, which we name Reprogrammer.
Our empirical evidence reveals that Reprogrammer is less intrusive and yields superior downstream models.
arXiv Detail & Related papers (2024-03-16T04:19:48Z) - Calibrating Large Language Models with Sample Consistency [76.23956851098598]
We explore the potential of deriving confidence from the distribution of multiple randomly sampled model generations, via three measures of consistency.
Results show that consistency-based calibration methods outperform existing post-hoc approaches.
We offer practical guidance on choosing suitable consistency metrics for calibration, tailored to the characteristics of various LMs.
arXiv Detail & Related papers (2024-02-21T16:15:20Z) - Preserving Knowledge Invariance: Rethinking Robustness Evaluation of
Open Information Extraction [50.62245481416744]
We present the first benchmark that simulates the evaluation of open information extraction models in the real world.
We design and annotate a large-scale testbed in which each example is a knowledge-invariant clique.
By further elaborating the robustness metric, a model is judged to be robust if its performance is consistently accurate on the overall cliques.
arXiv Detail & Related papers (2023-05-23T12:05:09Z) - Annealing Double-Head: An Architecture for Online Calibration of Deep
Neural Networks [1.1602089225841632]
Modern deep neural networks are generally poorly calibrated due to the overestimation of predictive confidence.
We propose Annealing Double-Head, a simple-to-implement but highly effective architecture for calibrating the DNN during training.
We demonstrate that our method achieves state-of-the-art model calibration performance without post-processing.
arXiv Detail & Related papers (2022-12-27T21:21:58Z) - Calibrated ensembles can mitigate accuracy tradeoffs under distribution
shift [108.30303219703845]
We find that ID-calibrated ensembles outperforms prior state-of-the-art (based on self-training) on both ID and OOD accuracy.
We analyze this method in stylized settings, and identify two important conditions for ensembles to perform well both ID and OOD.
arXiv Detail & Related papers (2022-07-18T23:14:44Z) - Uncertainty-sensitive Activity Recognition: a Reliability Benchmark and
the CARING Models [37.60817779613977]
We present the first study of how welthe confidence values of modern action recognition architectures indeed reflect the probability of the correct outcome.
We introduce a new approach which learns to transform the model output into realistic confidence estimates through an additional calibration network.
arXiv Detail & Related papers (2021-01-02T15:41:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.