S3C: Semi-Supervised VQA Natural Language Explanation via Self-Critical
Learning
- URL: http://arxiv.org/abs/2309.02155v1
- Date: Tue, 5 Sep 2023 11:47:51 GMT
- Title: S3C: Semi-Supervised VQA Natural Language Explanation via Self-Critical
Learning
- Authors: Wei Suo, Mengyang Sun, Weisong Liu, Yiqi Gao, Peng Wang, Yanning
Zhang, Qi Wu
- Abstract summary: VQA Natural Language Explanation (VQA-NLE) task aims to explain the decision-making process of VQA models in natural language.
We propose a new Semi-Supervised VQA-NLE via Self-Critical Learning (S3C)
S3C evaluates the candidate explanations by answering rewards to improve the logical consistency between answers and rationales.
- Score: 46.787034512390434
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: VQA Natural Language Explanation (VQA-NLE) task aims to explain the
decision-making process of VQA models in natural language. Unlike traditional
attention or gradient analysis, free-text rationales can be easier to
understand and gain users' trust. Existing methods mostly use post-hoc or
self-rationalization models to obtain a plausible explanation. However, these
frameworks are bottlenecked by the following challenges: 1) the reasoning
process cannot be faithfully responded to and suffer from the problem of
logical inconsistency. 2) Human-annotated explanations are expensive and
time-consuming to collect. In this paper, we propose a new Semi-Supervised
VQA-NLE via Self-Critical Learning (S3C), which evaluates the candidate
explanations by answering rewards to improve the logical consistency between
answers and rationales. With a semi-supervised learning framework, the S3C can
benefit from a tremendous amount of samples without human-annotated
explanations. A large number of automatic measures and human evaluations all
show the effectiveness of our method. Meanwhile, the framework achieves a new
state-of-the-art performance on the two VQA-NLE datasets.
Related papers
- Towards More Faithful Natural Language Explanation Using Multi-Level
Contrastive Learning in VQA [7.141288053123662]
Natural language explanation in visual question answer (VQA-NLE) aims to explain the decision-making process of models by generating natural language sentences to increase users' trust in the black-box systems.
Existing post-hoc explanations are not always aligned with human logical inference, suffering from the issues on: 1) Deductive unsatisfiability, the generated explanations do not logically lead to the answer; 2) Factual inconsistency, the model falsifies its counterfactual explanation for answers without considering the facts in images; and 3) Semantic perturbation insensitivity, the model can not recognize the semantic changes caused by small perturbations
arXiv Detail & Related papers (2023-12-21T05:51:55Z) - Dynamic Clue Bottlenecks: Towards Interpretable-by-Design Visual Question Answering [58.64831511644917]
We introduce an interpretable by design model that factors model decisions into intermediate human-legible explanations.
We show that our inherently interpretable system can improve 4.64% over a comparable black-box system in reasoning-focused questions.
arXiv Detail & Related papers (2023-05-24T08:33:15Z) - ZARA: Improving Few-Shot Self-Rationalization for Small Language Models [29.755148112827502]
We present a novel approach, Zero-shot Augmentation of Rationale-Answer pairs (ZARA), to automatically construct pseudo-parallel data for self-training.
ZARA achieves SOTA performance on the FEB benchmark, for both the task accuracy and the explanation metric.
arXiv Detail & Related papers (2023-05-12T10:07:12Z) - Explanations from Large Language Models Make Small Reasoners Better [61.991772773700006]
We show that our method can consistently and significantly outperform finetuning baselines across different settings.
As a side benefit, human evaluation shows that our method can generate high-quality explanations to justify its predictions.
arXiv Detail & Related papers (2022-10-13T04:50:02Z) - Learn to Explain: Multimodal Reasoning via Thought Chains for Science
Question Answering [124.16250115608604]
We present Science Question Answering (SQA), a new benchmark that consists of 21k multimodal multiple choice questions with a diverse set of science topics and annotations of their answers with corresponding lectures and explanations.
We show that SQA improves the question answering performance by 1.20% in few-shot GPT-3 and 3.99% in fine-tuned UnifiedQA.
Our analysis further shows that language models, similar to humans, benefit from explanations to learn from fewer data and achieve the same performance with just 40% of the data.
arXiv Detail & Related papers (2022-09-20T07:04:24Z) - The Unreliability of Explanations in Few-Shot In-Context Learning [50.77996380021221]
We focus on two NLP tasks that involve reasoning over text, namely question answering and natural language inference.
We show that explanations judged as good by humans--those that are logically consistent with the input--usually indicate more accurate predictions.
We present a framework for calibrating model predictions based on the reliability of the explanations.
arXiv Detail & Related papers (2022-05-06T17:57:58Z) - elBERto: Self-supervised Commonsense Learning for Question Answering [131.51059870970616]
We propose a Self-supervised Bidirectional Representation Learning of Commonsense framework, which is compatible with off-the-shelf QA model architectures.
The framework comprises five self-supervised tasks to force the model to fully exploit the additional training signals from contexts containing rich commonsense.
elBERto achieves substantial improvements on out-of-paragraph and no-effect questions where simple lexical similarity comparison does not help.
arXiv Detail & Related papers (2022-03-17T16:23:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.