From Wrong To Right: A Recursive Approach Towards Vision-Language
Explanation
- URL: http://arxiv.org/abs/2311.12391v1
- Date: Tue, 21 Nov 2023 07:02:32 GMT
- Title: From Wrong To Right: A Recursive Approach Towards Vision-Language
Explanation
- Authors: Jiaxin Ge, Sanjay Subramanian, Trevor Darrell, Boyi Li
- Abstract summary: We present ReVisE: a $textbfRe$cursive $textbfVis$ual $textbfE$xplanation algorithm.
Our method iteratively computes visual features (conditioned on the text input), an answer, and an explanation.
We find that this multi-step approach guides the model to correct its own answers and outperforms single-step explanation generation.
- Score: 60.746079839840895
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Addressing the challenge of adapting pre-trained vision-language models for
generating insightful explanations for visual reasoning tasks with limited
annotations, we present ReVisE: a $\textbf{Re}$cursive $\textbf{Vis}$ual
$\textbf{E}$xplanation algorithm. Our method iteratively computes visual
features (conditioned on the text input), an answer, and an explanation, to
improve the explanation quality step by step until the answer converges. We
find that this multi-step approach guides the model to correct its own answers
and outperforms single-step explanation generation. Furthermore, explanations
generated by ReVisE also serve as valuable annotations for few-shot
self-training. Our approach outperforms previous methods while utilizing merely
5% of the human-annotated explanations across 10 metrics, demonstrating up to a
4.2 and 1.3 increase in BLEU-1 score on the VCR and VQA-X datasets,
underscoring the efficacy and data-efficiency of our method.
Related papers
- Beyond Coarse-Grained Matching in Video-Text Retrieval [50.799697216533914]
We introduce a new approach for fine-grained evaluation.
Our approach can be applied to existing datasets by automatically generating hard negative test captions.
Experiments on our fine-grained evaluations demonstrate that this approach enhances a model's ability to understand fine-grained differences.
arXiv Detail & Related papers (2024-10-16T09:42:29Z) - Intrinsic Subgraph Generation for Interpretable Graph based Visual Question Answering [27.193336817953142]
We introduce an interpretable approach for graph-based Visual Question Answering (VQA)
Our model is designed to intrinsically produce a subgraph during the question-answering process as its explanation.
We compare these generated subgraphs against established post-hoc explainability methods for graph neural networks, and perform a human evaluation.
arXiv Detail & Related papers (2024-03-26T12:29:18Z) - Silkie: Preference Distillation for Large Visual Language Models [56.10697821410489]
This paper explores preference distillation for large vision language models (LVLMs)
We first build a vision-language feedback dataset utilizing AI annotation.
We adopt GPT-4V to assess the generated outputs regarding helpfulness, visual faithfulness, and ethical considerations.
The resulting model Silkie, achieves 6.9% and 9.5% relative improvement on the MME benchmark regarding the perception and cognition capabilities.
arXiv Detail & Related papers (2023-12-17T09:44:27Z) - LLM4Vis: Explainable Visualization Recommendation using ChatGPT [21.875548217393927]
We propose a novel ChatGPT-based approach to perform visualization recommendation and return human-like explanations.
Our approach involves feature description, demonstration example selection, explanation generation, demonstration example construction, and inference steps.
arXiv Detail & Related papers (2023-10-11T16:51:46Z) - Dynamic Clue Bottlenecks: Towards Interpretable-by-Design Visual Question Answering [58.64831511644917]
We introduce an interpretable by design model that factors model decisions into intermediate human-legible explanations.
We show that our inherently interpretable system can improve 4.64% over a comparable black-box system in reasoning-focused questions.
arXiv Detail & Related papers (2023-05-24T08:33:15Z) - Explanation Selection Using Unlabeled Data for Chain-of-Thought
Prompting [80.9896041501715]
Explanations that have not been "tuned" for a task, such as off-the-shelf explanations written by nonexperts, may lead to mediocre performance.
This paper tackles the problem of how to optimize explanation-infused prompts in a blackbox fashion.
arXiv Detail & Related papers (2023-02-09T18:02:34Z) - Explanation Regeneration via Information Bottleneck [29.92996769997743]
We develop an information bottleneck method EIB to produce refined explanations that are sufficient and concise.
Our approach regenerates the free-text explanation by polishing the single-pass output from the pretrained language model.
arXiv Detail & Related papers (2022-12-19T16:41:19Z) - Inducing Semantic Grouping of Latent Concepts for Explanations: An
Ante-Hoc Approach [18.170504027784183]
We show that by exploiting latent and properly modifying different parts of the model can result better explanation as well as provide superior predictive performance.
We also proposed a technique of using two different self-supervision techniques to extract meaningful concepts related to the type of self-supervision considered.
arXiv Detail & Related papers (2021-08-25T07:09:57Z) - Explain and Predict, and then Predict Again [6.865156063241553]
We propose ExPred, that uses multi-task learning in the explanation generation phase effectively trading-off explanation and prediction losses.
We conduct an extensive evaluation of our approach on three diverse language datasets.
arXiv Detail & Related papers (2021-01-11T19:36:52Z) - Leakage-Adjusted Simulatability: Can Models Generate Non-Trivial
Explanations of Their Behavior in Natural Language? [86.60613602337246]
We introduce a leakage-adjusted simulatability (LAS) metric for evaluating NL explanations.
LAS measures how well explanations help an observer predict a model's output, while controlling for how explanations can directly leak the output.
We frame explanation generation as a multi-agent game and optimize explanations for simulatability while penalizing label leakage.
arXiv Detail & Related papers (2020-10-08T16:59:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.