Related papers: Generate then Select: Open-ended Visual Question Answering Guided by World Knowledge

Generate then Select: Open-ended Visual Question Answering Guided by World Knowledge

URL: http://arxiv.org/abs/2305.18842v1
Date: Tue, 30 May 2023 08:34:13 GMT
Title: Generate then Select: Open-ended Visual Question Answering Guided by World Knowledge
Authors: Xingyu Fu and Sheng Zhang and Gukyeong Kwon and Pramuditha Perera and Henghui Zhu and Yuhao Zhang and Alexander Hanbo Li and William Yang Wang and Zhiguo Wang and Vittorio Castelli and Patrick Ng and Dan Roth and Bing Xiang
Abstract summary: Open-ended Visual Question Answering (VQA) task requires AI models to jointly reason over visual and natural language inputs. Pre-trained Language Models (PLM) such as GPT-3 have been applied to the task and shown to be powerful world knowledge sources. We propose RASO: a new VQA pipeline that deploys a generate-then-select strategy guided by world knowledge.
Score: 155.81786738036578
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: The open-ended Visual Question Answering (VQA) task requires AI models to jointly reason over visual and natural language inputs using world knowledge. Recently, pre-trained Language Models (PLM) such as GPT-3 have been applied to the task and shown to be powerful world knowledge sources. However, these methods suffer from low knowledge coverage caused by PLM bias -- the tendency to generate certain tokens over other tokens regardless of prompt changes, and high dependency on the PLM quality -- only models using GPT-3 can achieve the best result. To address the aforementioned challenges, we propose RASO: a new VQA pipeline that deploys a generate-then-select strategy guided by world knowledge for the first time. Rather than following the de facto standard to train a multi-modal model that directly generates the VQA answer, RASO first adopts PLM to generate all the possible answers, and then trains a lightweight answer selection model for the correct answer. As proved in our analysis, RASO expands the knowledge coverage from in-domain training data by a large margin. We provide extensive experimentation and show the effectiveness of our pipeline by advancing the state-of-the-art by 4.1% on OK-VQA, without additional computation cost. Code and models are released at http://cogcomp.org/page/publication_view/1010

Related papers

GenKI: Enhancing Open-Domain Question Answering with Knowledge Integration and Controllable Generation in Large Language Models [75.25348392263676]
Open-domain question answering (OpenQA) represents a cornerstone in natural language processing (NLP)<n>We propose a novel framework named GenKI, which aims to improve the OpenQA performance by exploring Knowledge Integration and controllable Generation.
arXiv Detail & Related papers (2025-05-26T08:18:33Z)
Genetic Auto-prompt Learning for Pre-trained Code Intelligence Language Models [54.58108387797138]
We investigate the effectiveness of prompt learning in code intelligence tasks. Existing automatic prompt design methods are very limited to code intelligence tasks. We propose Genetic Auto Prompt (GenAP) which utilizes an elaborate genetic algorithm to automatically design prompts.
arXiv Detail & Related papers (2024-03-20T13:37:00Z)
ReasoningLM: Enabling Structural Subgraph Reasoning in Pre-trained Language Models for Question Answering over Knowledge Graph [142.42275983201978]
We propose a subgraph-aware self-attention mechanism to imitate the GNN for performing structured reasoning. We also adopt an adaptation tuning strategy to adapt the model parameters with 20,000 subgraphs with synthesized questions. Experiments show that ReasoningLM surpasses state-of-the-art models by a large margin, even with fewer updated parameters and less training data.
arXiv Detail & Related papers (2023-12-30T07:18:54Z)
Improving Zero-shot Visual Question Answering via Large Language Models with Reasoning Question Prompts [22.669502403623166]
We present Reasoning Question Prompts for VQA tasks, which can further activate the potential of Large Language Models. We generate self-contained questions as reasoning question prompts via an unsupervised question edition module. Each reasoning question prompt clearly indicates the intent of the original question. Then, the candidate answers associated with their confidence scores acting as answer integritys are fed into LLMs.
arXiv Detail & Related papers (2023-11-15T15:40:46Z)
A Simple Baseline for Knowledge-Based Visual Question Answering [78.00758742784532]
This paper is on the problem of Knowledge-Based Visual Question Answering (KB-VQA) Our main contribution in this paper is to propose a much simpler and readily reproducible pipeline. Contrary to recent approaches, our method is training-free, does not require access to external databases or APIs, and achieves state-of-the-art accuracy on the OK-VQA and A-OK-VQA datasets.
arXiv Detail & Related papers (2023-10-20T15:08:17Z)
KEPR: Knowledge Enhancement and Plausibility Ranking for Generative Commonsense Question Answering [11.537283115693432]
We propose a Knowledge Enhancement and Plausibility Ranking approach grounded on the Generate-Then-Rank pipeline architecture. Specifically, we expand questions in terms of Wiktionary commonsense knowledge of keywords, and reformulate them with normalized patterns. We develop an ELECTRA-based answer ranking model, where logistic regression is conducted during training, with the aim of approxing different levels of plausibility.
arXiv Detail & Related papers (2023-05-15T04:58:37Z)
Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question Answering [30.858737348472626]
Knowledge-based visual question answering (VQA) requires external knowledge beyond the image to answer the question. Recent works have resorted to using a powerful large language model (LLM) as an implicit knowledge engine to acquire the necessary knowledge for answering. We present a conceptually simple, flexible, and general framework designed to prompt LLM with answers for knowledge-based VQA.
arXiv Detail & Related papers (2023-03-03T13:05:15Z)
Self-Prompting Large Language Models for Zero-Shot Open-Domain QA [67.08732962244301]
Open-Domain Question Answering (ODQA) aims to answer questions without explicitly providing background documents. This task becomes notably challenging in a zero-shot setting where no data is available to train tailored retrieval-reader models. We propose a Self-Prompting framework to explicitly utilize the massive knowledge encoded in the parameters of Large Language Models.
arXiv Detail & Related papers (2022-12-16T18:23:43Z)
TSGP: Two-Stage Generative Prompting for Unsupervised Commonsense Question Answering [4.965306353273393]
Unsupervised commonsense question answering requires mining effective commonsense knowledge without the rely on the labeled task data. We propose a two-stage prompt-based unsupervised commonsense question answering framework (TSGP) Experimental results and analysis on three different commonsense reasoning tasks, CommonsenseQA, OpenBookQA, and SocialIQA, demonstrate that TSGP significantly improves the reasoning ability of language models in unsupervised settings.
arXiv Detail & Related papers (2022-11-24T10:19:24Z)
Text Generation with Efficient (Soft) Q-Learning [91.47743595382758]
Reinforcement learning (RL) offers a more flexible solution by allowing users to plug in arbitrary task metrics as reward. We introduce a new RL formulation for text generation from the soft Q-learning perspective. We apply the approach to a wide range of tasks, including learning from noisy/negative examples, adversarial attacks, and prompt generation.
arXiv Detail & Related papers (2021-06-14T18:48:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.