Selective "Selective Prediction": Reducing Unnecessary Abstention in Vision-Language Reasoning
- URL: http://arxiv.org/abs/2402.15610v2
- Date: Wed, 12 Jun 2024 21:09:39 GMT
- Title: Selective "Selective Prediction": Reducing Unnecessary Abstention in Vision-Language Reasoning
- Authors: Tejas Srinivasan, Jack Hessel, Tanmay Gupta, Bill Yuchen Lin, Yejin Choi, Jesse Thomason, Khyathi Raghavi Chandu,
- Abstract summary: We introduce ReCoVERR, an inference-time algorithm to reduce the over-abstention of a selective vision-language system.
ReCoVERR tries to find relevant clues in an image that provide additional evidence for the prediction.
- Score: 67.82016092549284
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Selective prediction minimizes incorrect predictions from vision-language models (VLMs) by allowing them to abstain from answering when uncertain. However, when deploying a vision-language system with low tolerance for inaccurate predictions, selective prediction may be over-cautious and abstain too frequently, even on many correct predictions. We introduce ReCoVERR, an inference-time algorithm to reduce the over-abstention of a selective vision-language system without increasing the error rate of the system's predictions. When the VLM makes a low-confidence prediction, instead of abstaining ReCoVERR tries to find relevant clues in the image that provide additional evidence for the prediction. ReCoVERR uses an LLM to pose related questions to the VLM, collects high-confidence evidences, and if enough evidence confirms the prediction the system makes a prediction instead of abstaining. ReCoVERR enables three VLMs (BLIP2, InstructBLIP, and LLaVA-1.5) to answer up to 20% more questions on the VQAv2 and A-OKVQA tasks without decreasing system accuracy, thus improving overall system reliability. Our code is available at https://github.com/tejas1995/ReCoVERR.
Related papers
- To Trust or Not to Trust: On Calibration in ML-based Resource Allocation for Wireless Networks [11.718895971015339]
This paper studies the calibration performance of an ML-based outage predictor within a single-user, multi-resource allocation framework.<n>We first establish key theoretical properties of this system's outage probability (OP) under perfect calibration.<n>We show that as the number of resources grows, the OP of a perfectly calibrated predictor approaches the expected output conditioned on it being below the classification threshold.
arXiv Detail & Related papers (2025-07-23T13:23:43Z) - ViLU: Learning Vision-Language Uncertainties for Failure Prediction [28.439422629957424]
We introduce ViLU, a new Vision-Language Uncertainty quantification framework.<n>ViLU constructs an uncertainty-aware multi-modal representation by integrating the visual embedding, the predicted textual embedding, and an image-conditioned textual representation via cross-attention.<n>Our proposed approach is well-suited for post-hoc settings, where only vision and text embeddings are available without direct access to the model itself.
arXiv Detail & Related papers (2025-07-10T10:41:13Z) - From Trial to Triumph: Advancing Long Video Understanding via Visual Context Sample Scaling and Self-reward Alignment [51.3011761744484]
Multi-modal Large language models can only process a finite number of frames in a single inference.
We propose multiple predictions through visual context sampling, followed by a scoring mechanism to select the final prediction.
Experiments show that this approach covers the correct answer for a high percentage of long video questions.
arXiv Detail & Related papers (2025-03-26T11:53:03Z) - Correct after Answer: Enhancing Multi-Span Question Answering with Post-Processing Method [11.794628063040108]
Multi-Span Question Answering (MSQA) requires models to extract one or multiple answer spans from a given context to answer a question.
We propose Answering-Classifying-Correcting (ACC) framework, which employs a post-processing strategy to handle incorrect predictions.
arXiv Detail & Related papers (2024-10-22T08:04:32Z) - Eliciting Uncertainty in Chain-of-Thought to Mitigate Bias against Forecasting Harmful User Behaviors [29.892041865029803]
Conversation forecasting tasks a model with predicting the outcome of an unfolding conversation.
It can be applied in social media moderation to predict harmful user behaviors before they occur.
This paper explores what extent model uncertainty can be used as a tool to mitigate potential biases.
arXiv Detail & Related papers (2024-10-17T15:07:53Z) - Calibrated Large Language Models for Binary Question Answering [49.1574468325115]
A well-calibrated model should produce probabilities that accurately reflect the likelihood of its predictions being correct.
We propose a novel approach that utilizes the inductive Venn--Abers predictor (IVAP) to calibrate the probabilities associated with the output tokens corresponding to the binary labels.
arXiv Detail & Related papers (2024-07-01T09:31:03Z) - Conformal Language Modeling [61.94417935386489]
We propose a novel approach to conformal prediction for generative language models (LMs)
Standard conformal prediction produces prediction sets with rigorous, statistical guarantees.
We demonstrate the promise of our approach on multiple tasks in open-domain question answering, text summarization, and radiology report generation.
arXiv Detail & Related papers (2023-06-16T21:55:08Z) - Conformal Prediction Regions for Time Series using Linear
Complementarity Programming [25.094249285804224]
We propose an optimization-based method for reducing conservatism to enable long horizon planning and verification.
We show that this problem can be cast as a mixed integer linear complementarity program (MILCP), which we then relax into a linear complementarity program (LCP)
arXiv Detail & Related papers (2023-04-03T15:32:38Z) - Uncertainty Quantification with Pre-trained Language Models: A
Large-Scale Empirical Analysis [120.9545643534454]
It is crucial for the pipeline to minimize the calibration error, especially in safety-critical applications.
There are various considerations behind the pipeline: (1) the choice and (2) the size of PLM, (3) the choice of uncertainty quantifier, (4) the choice of fine-tuning loss, and many more.
In response, we recommend the following: (1) use ELECTRA for PLM encoding, (2) use larger PLMs if possible, (3) use Temp Scaling as the uncertainty quantifier, and (4) use Focal Loss for fine-tuning.
arXiv Detail & Related papers (2022-10-10T14:16:01Z) - Learning to Predict Trustworthiness with Steep Slope Loss [69.40817968905495]
We study the problem of predicting trustworthiness on real-world large-scale datasets.
We observe that the trustworthiness predictors trained with prior-art loss functions are prone to view both correct predictions and incorrect predictions to be trustworthy.
We propose a novel steep slope loss to separate the features w.r.t. correct predictions from the ones w.r.t. incorrect predictions by two slide-like curves that oppose each other.
arXiv Detail & Related papers (2021-09-30T19:19:09Z) - Aligned Contrastive Predictive Coding [10.521845940927163]
We investigate the possibility of forcing a self-supervised model trained using a contrastive predictive loss to extract slowly varying latent representations.
Rather than producing individual predictions for each of the future representations, the model emits a sequence of predictions shorter than that of the upcoming representations to which they will be aligned.
arXiv Detail & Related papers (2021-04-24T13:07:22Z) - Controlled abstention neural networks for identifying skillful
predictions for regression problems [0.0]
We introduce a novel loss function, termed "abstention loss", that allows neural networks to identify forecasts of opportunity for regression problems.
The abstention loss is applied during training to preferentially learn from the more confident samples.
arXiv Detail & Related papers (2021-04-16T17:16:32Z) - AutoCP: Automated Pipelines for Accurate Prediction Intervals [84.16181066107984]
This paper proposes an AutoML framework called Automatic Machine Learning for Conformal Prediction (AutoCP)
Unlike the familiar AutoML frameworks that attempt to select the best prediction model, AutoCP constructs prediction intervals that achieve the user-specified target coverage rate.
We tested AutoCP on a variety of datasets and found that it significantly outperforms benchmark algorithms.
arXiv Detail & Related papers (2020-06-24T23:13:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.