Generate then Select: Open-ended Visual Question Answering Guided by
World Knowledge
- URL: http://arxiv.org/abs/2305.18842v1
- Date: Tue, 30 May 2023 08:34:13 GMT
- Title: Generate then Select: Open-ended Visual Question Answering Guided by
World Knowledge
- Authors: Xingyu Fu and Sheng Zhang and Gukyeong Kwon and Pramuditha Perera and
Henghui Zhu and Yuhao Zhang and Alexander Hanbo Li and William Yang Wang and
Zhiguo Wang and Vittorio Castelli and Patrick Ng and Dan Roth and Bing Xiang
- Abstract summary: Open-ended Visual Question Answering (VQA) task requires AI models to jointly reason over visual and natural language inputs.
Pre-trained Language Models (PLM) such as GPT-3 have been applied to the task and shown to be powerful world knowledge sources.
We propose RASO: a new VQA pipeline that deploys a generate-then-select strategy guided by world knowledge.
- Score: 155.81786738036578
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The open-ended Visual Question Answering (VQA) task requires AI models to
jointly reason over visual and natural language inputs using world knowledge.
Recently, pre-trained Language Models (PLM) such as GPT-3 have been applied to
the task and shown to be powerful world knowledge sources. However, these
methods suffer from low knowledge coverage caused by PLM bias -- the tendency
to generate certain tokens over other tokens regardless of prompt changes, and
high dependency on the PLM quality -- only models using GPT-3 can achieve the
best result.
To address the aforementioned challenges, we propose RASO: a new VQA pipeline
that deploys a generate-then-select strategy guided by world knowledge for the
first time. Rather than following the de facto standard to train a multi-modal
model that directly generates the VQA answer, RASO first adopts PLM to generate
all the possible answers, and then trains a lightweight answer selection model
for the correct answer. As proved in our analysis, RASO expands the knowledge
coverage from in-domain training data by a large margin. We provide extensive
experimentation and show the effectiveness of our pipeline by advancing the
state-of-the-art by 4.1% on OK-VQA, without additional computation cost. Code
and models are released at http://cogcomp.org/page/publication_view/1010
Related papers
- Genetic Auto-prompt Learning for Pre-trained Code Intelligence Language Models [54.58108387797138]
We investigate the effectiveness of prompt learning in code intelligence tasks.
Existing automatic prompt design methods are very limited to code intelligence tasks.
We propose Genetic Auto Prompt (GenAP) which utilizes an elaborate genetic algorithm to automatically design prompts.
arXiv Detail & Related papers (2024-03-20T13:37:00Z) - ReasoningLM: Enabling Structural Subgraph Reasoning in Pre-trained
Language Models for Question Answering over Knowledge Graph [142.42275983201978]
We propose a subgraph-aware self-attention mechanism to imitate the GNN for performing structured reasoning.
We also adopt an adaptation tuning strategy to adapt the model parameters with 20,000 subgraphs with synthesized questions.
Experiments show that ReasoningLM surpasses state-of-the-art models by a large margin, even with fewer updated parameters and less training data.
arXiv Detail & Related papers (2023-12-30T07:18:54Z) - Improving Zero-shot Visual Question Answering via Large Language Models
with Reasoning Question Prompts [22.669502403623166]
We present Reasoning Question Prompts for VQA tasks, which can further activate the potential of Large Language Models.
We generate self-contained questions as reasoning question prompts via an unsupervised question edition module.
Each reasoning question prompt clearly indicates the intent of the original question.
Then, the candidate answers associated with their confidence scores acting as answer integritys are fed into LLMs.
arXiv Detail & Related papers (2023-11-15T15:40:46Z) - A Simple Baseline for Knowledge-Based Visual Question Answering [78.00758742784532]
This paper is on the problem of Knowledge-Based Visual Question Answering (KB-VQA)
Our main contribution in this paper is to propose a much simpler and readily reproducible pipeline.
Contrary to recent approaches, our method is training-free, does not require access to external databases or APIs, and achieves state-of-the-art accuracy on the OK-VQA and A-OK-VQA datasets.
arXiv Detail & Related papers (2023-10-20T15:08:17Z) - KEPR: Knowledge Enhancement and Plausibility Ranking for Generative
Commonsense Question Answering [11.537283115693432]
We propose a Knowledge Enhancement and Plausibility Ranking approach grounded on the Generate-Then-Rank pipeline architecture.
Specifically, we expand questions in terms of Wiktionary commonsense knowledge of keywords, and reformulate them with normalized patterns.
We develop an ELECTRA-based answer ranking model, where logistic regression is conducted during training, with the aim of approxing different levels of plausibility.
arXiv Detail & Related papers (2023-05-15T04:58:37Z) - Prophet: Prompting Large Language Models with Complementary Answer
Heuristics for Knowledge-based Visual Question Answering [30.858737348472626]
Knowledge-based visual question answering (VQA) requires external knowledge beyond the image to answer the question.
Recent works have resorted to using a powerful large language model (LLM) as an implicit knowledge engine to acquire the necessary knowledge for answering.
We present a conceptually simple, flexible, and general framework designed to prompt LLM with answers for knowledge-based VQA.
arXiv Detail & Related papers (2023-03-03T13:05:15Z) - Self-Prompting Large Language Models for Zero-Shot Open-Domain QA [67.08732962244301]
Open-Domain Question Answering (ODQA) aims to answer questions without explicitly providing background documents.
This task becomes notably challenging in a zero-shot setting where no data is available to train tailored retrieval-reader models.
We propose a Self-Prompting framework to explicitly utilize the massive knowledge encoded in the parameters of Large Language Models.
arXiv Detail & Related papers (2022-12-16T18:23:43Z) - TSGP: Two-Stage Generative Prompting for Unsupervised Commonsense
Question Answering [4.965306353273393]
Unsupervised commonsense question answering requires mining effective commonsense knowledge without the rely on the labeled task data.
We propose a two-stage prompt-based unsupervised commonsense question answering framework (TSGP)
Experimental results and analysis on three different commonsense reasoning tasks, CommonsenseQA, OpenBookQA, and SocialIQA, demonstrate that TSGP significantly improves the reasoning ability of language models in unsupervised settings.
arXiv Detail & Related papers (2022-11-24T10:19:24Z) - Text Generation with Efficient (Soft) Q-Learning [91.47743595382758]
Reinforcement learning (RL) offers a more flexible solution by allowing users to plug in arbitrary task metrics as reward.
We introduce a new RL formulation for text generation from the soft Q-learning perspective.
We apply the approach to a wide range of tasks, including learning from noisy/negative examples, adversarial attacks, and prompt generation.
arXiv Detail & Related papers (2021-06-14T18:48:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.