Visual Question Answering with Prior Class Semantics
- URL: http://arxiv.org/abs/2005.01239v1
- Date: Mon, 4 May 2020 02:46:31 GMT
- Title: Visual Question Answering with Prior Class Semantics
- Authors: Violetta Shevchenko, Damien Teney, Anthony Dick, Anton van den Hengel
- Abstract summary: We show how to exploit additional information pertaining to the semantics of candidate answers.
We extend the answer prediction process with a regression objective in a semantic space.
Our method brings improvements in consistency and accuracy over a range of question types.
- Score: 50.845003775809836
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a novel mechanism to embed prior knowledge in a model for visual
question answering. The open-set nature of the task is at odds with the
ubiquitous approach of training of a fixed classifier. We show how to exploit
additional information pertaining to the semantics of candidate answers. We
extend the answer prediction process with a regression objective in a semantic
space, in which we project candidate answers using prior knowledge derived from
word embeddings. We perform an extensive study of learned representations with
the GQA dataset, revealing that important semantic information is captured in
the relations between embeddings in the answer space. Our method brings
improvements in consistency and accuracy over a range of question types.
Experiments with novel answers, unseen during training, indicate the method's
potential for open-set prediction.
Related papers
- Towards Reliable and Factual Response Generation: Detecting Unanswerable
Questions in Information-Seeking Conversations [16.99952884041096]
Generative AI models face the challenge of hallucinations that can undermine users' trust in such systems.
We approach the problem of conversational information seeking as a two-step process, where relevant passages in a corpus are identified first and then summarized into a final system response.
Specifically, our proposed method employs a sentence-level classifier to detect if the answer is present, then aggregates these predictions on the passage level, and eventually across the top-ranked passages to arrive at a final answerability estimate.
arXiv Detail & Related papers (2024-01-21T10:15:36Z) - Open-Set Knowledge-Based Visual Question Answering with Inference Paths [79.55742631375063]
The purpose of Knowledge-Based Visual Question Answering (KB-VQA) is to provide a correct answer to the question with the aid of external knowledge bases.
We propose a new retriever-ranker paradigm of KB-VQA, Graph pATH rankER (GATHER for brevity)
Specifically, it contains graph constructing, pruning, and path-level ranking, which not only retrieves accurate answers but also provides inference paths that explain the reasoning process.
arXiv Detail & Related papers (2023-10-12T09:12:50Z) - Answering Ambiguous Questions via Iterative Prompting [84.3426020642704]
In open-domain question answering, due to the ambiguity of questions, multiple plausible answers may exist.
One approach is to directly predict all valid answers, but this can struggle with balancing relevance and diversity.
We present AmbigPrompt to address the imperfections of existing approaches to answering ambiguous questions.
arXiv Detail & Related papers (2023-07-08T04:32:17Z) - Weakly Supervised Visual Question Answer Generation [2.7605547688813172]
We present a weakly supervised method that synthetically generates question-answer pairs procedurally from visual information and captions.
We perform an exhaustive experimental analysis on VQA dataset and see that our model significantly outperforms SOTA methods on BLEU scores.
arXiv Detail & Related papers (2023-06-11T08:46:42Z) - Fine-Grained Visual Entailment [51.66881737644983]
We propose an extension of this task, where the goal is to predict the logical relationship of fine-grained knowledge elements within a piece of text to an image.
Unlike prior work, our method is inherently explainable and makes logical predictions at different levels of granularity.
We evaluate our method on a new dataset of manually annotated knowledge elements and show that our method achieves 68.18% accuracy at this challenging task.
arXiv Detail & Related papers (2022-03-29T16:09:38Z) - Coarse-to-Fine Reasoning for Visual Question Answering [18.535633096397397]
We present a new reasoning framework to fill the gap between visual features and semantic clues in the Visual Question Answering (VQA) task.
Our method first extracts the features and predicates from the image and question.
We then propose a new reasoning framework to effectively jointly learn these features and predicates in a coarse-to-fine manner.
arXiv Detail & Related papers (2021-10-06T06:29:52Z) - Cooperative Learning of Zero-Shot Machine Reading Comprehension [9.868221447090855]
We propose a cooperative, self-play learning model for question generation and answering.
We can train question generation and answering models on any textual corpora without annotation.
Our model outperforms the state-of-the-art pretrained language models on standard question answering benchmarks.
arXiv Detail & Related papers (2021-03-12T18:22:28Z) - Improving Commonsense Question Answering by Graph-based Iterative
Retrieval over Multiple Knowledge Sources [26.256653692882715]
How to engage commonsense effectively in question answering systems is still under exploration.
We propose a novel question-answering method by integrating ConceptNet, Wikipedia, and the Cambridge Dictionary.
We use a pre-trained language model to encode the question, retrieved knowledge and choices, and propose an answer choice-aware attention mechanism.
arXiv Detail & Related papers (2020-11-05T08:50:43Z) - Predicting What You Already Know Helps: Provable Self-Supervised
Learning [60.27658820909876]
Self-supervised representation learning solves auxiliary prediction tasks (known as pretext tasks) without requiring labeled data.
We show a mechanism exploiting the statistical connections between certain em reconstruction-based pretext tasks that guarantee to learn a good representation.
We prove the linear layer yields small approximation error even for complex ground truth function class.
arXiv Detail & Related papers (2020-08-03T17:56:13Z) - A Revised Generative Evaluation of Visual Dialogue [80.17353102854405]
We propose a revised evaluation scheme for the VisDial dataset.
We measure consensus between answers generated by the model and a set of relevant answers.
We release these sets and code for the revised evaluation scheme as DenseVisDial.
arXiv Detail & Related papers (2020-04-20T13:26:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.