Embodied Question Answering via Multi-LLM Systems
- URL: http://arxiv.org/abs/2406.10918v3
- Date: Tue, 25 Jun 2024 10:50:09 GMT
- Title: Embodied Question Answering via Multi-LLM Systems
- Authors: Bhrij Patel, Vishnu Sashank Dorbala, Dinesh Manocha, Amrit Singh Bedi,
- Abstract summary: Embodied Question Answering (EQA) is an important problem, which involves an agent exploring the environment to answer user queries.
We consider EQA in a multi-agent framework involving multiple large language models (LLM) based agents independently answering queries about a household environment.
- Score: 55.581423861790945
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Embodied Question Answering (EQA) is an important problem, which involves an agent exploring the environment to answer user queries. In the existing literature, EQA has exclusively been studied in single-agent scenarios, where exploration can be time-consuming and costly. In this work, we consider EQA in a multi-agent framework involving multiple large language models (LLM) based agents independently answering queries about a household environment. To generate one answer for each query, we use the individual responses to train a Central Answer Model (CAM) that aggregates responses for a robust answer. Using CAM, we observe a $50\%$ higher EQA accuracy when compared against aggregation methods for ensemble LLM, such as voting schemes and debates. CAM does not require any form of agent communication, alleviating it from the associated costs. We ablate CAM with various nonlinear (neural network, random forest, decision tree, XGBoost) and linear (logistic regression classifier, SVM) algorithms. Finally, we present a feature importance analysis for CAM via permutation feature importance (PFI), quantifying CAMs reliance on each independent agent and query context.
Related papers
- Multi-Agents Based on Large Language Models for Knowledge-based Visual Question Answering [6.6897007888321465]
We propose a voting framework for knowledge-based Visual Question Answering.
We design three agents that simulate different levels of staff in a team, and assign the available tools according to the levels.
Experiments on OK-VQA and A-OKVQA show that our approach outperforms other baselines by 2.2 and 1.0, respectively.
arXiv Detail & Related papers (2024-12-24T11:24:56Z) - Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent [102.31558123570437]
Multimodal Retrieval Augmented Generation (mRAG) plays an important role in mitigating the "hallucination" issue inherent in multimodal large language models (MLLMs)
We propose the first self-adaptive planning agent for multimodal retrieval, OmniSearch.
arXiv Detail & Related papers (2024-11-05T09:27:21Z) - ELOQ: Resources for Enhancing LLM Detection of Out-of-Scope Questions [52.33835101586687]
Large Language Models (LLMs) are widely used in Conversational AI systems to generate responses to user inquiries.
We propose a guided hallucination-based method to efficiently generate a diverse set of out-of-scope questions from a given document corpus.
arXiv Detail & Related papers (2024-10-18T16:11:29Z) - S-EQA: Tackling Situational Queries in Embodied Question Answering [48.43453390717167]
We present and tackle the problem of Embodied Question Answering with Situational Queries (S-EQA) in a household environment.
We first introduce a novel Prompt-Generate-Evaluate scheme that wraps around an LLM's output to create a dataset of unique situational queries and corresponding consensus object information.
We report an improved accuracy of 15.31% while using queries framed from the generated object consensus for Visual Question Answering (VQA) over directly answering situational ones.
arXiv Detail & Related papers (2024-05-08T00:45:20Z) - Chain-of-Discussion: A Multi-Model Framework for Complex Evidence-Based Question Answering [55.295699268654545]
We propose a novel Chain-of-Discussion framework to leverage the synergy among open-source Large Language Models.
Our experiments show that discussions among multiple LLMs play a vital role in enhancing the quality of answers.
arXiv Detail & Related papers (2024-02-26T05:31:34Z) - Large Language Model based Multi-Agents: A Survey of Progress and Challenges [44.92286030322281]
Large Language Models (LLMs) have achieved remarkable success across a wide array of tasks.
Recently, based on the development of using one LLM as a single planning or decision-making agent, LLM-based multi-agent systems have achieved considerable progress in complex problem-solving and world simulation.
arXiv Detail & Related papers (2024-01-21T23:36:14Z) - Enhancing Answer Selection in Community Question Answering with
Pre-trained and Large Language Models [0.9065034043031668]
We first propose the Question-Answer cross attention networks (QAN) with pre-trained models for answer selection.
We then utilize large language model (LLM) to perform answer selection with knowledge augmentation.
Experiments show that the QAN model state-of-the-art performance on two datasets, SemEval2015 and SemEval 2017.
arXiv Detail & Related papers (2023-11-29T10:24:50Z) - Improving Zero-shot Visual Question Answering via Large Language Models
with Reasoning Question Prompts [22.669502403623166]
We present Reasoning Question Prompts for VQA tasks, which can further activate the potential of Large Language Models.
We generate self-contained questions as reasoning question prompts via an unsupervised question edition module.
Each reasoning question prompt clearly indicates the intent of the original question.
Then, the candidate answers associated with their confidence scores acting as answer integritys are fed into LLMs.
arXiv Detail & Related papers (2023-11-15T15:40:46Z) - Attributed Question Answering: Evaluation and Modeling for Attributed
Large Language Models [68.37431984231338]
Large language models (LLMs) have shown impressive results across a variety of tasks while requiring little or no direct supervision.
We believe the ability of an LLM to an attribute to the text that it generates is likely to be crucial for both system developers and users in this setting.
arXiv Detail & Related papers (2022-12-15T18:45:29Z) - Mixture of Experts for Biomedical Question Answering [34.92691831878302]
We propose a Mixture-of-Expert (MoE) based question answering method called MoEBQA.
MoEBQA decouples the computation for different types of questions by sparse routing.
We evaluate MoEBQA on three Biomedical Question Answering (BQA) datasets constructed based on real examinations.
arXiv Detail & Related papers (2022-04-15T14:11:40Z) - HeteroQA: Learning towards Question-and-Answering through Multiple
Information Sources via Heterogeneous Graph Modeling [50.39787601462344]
Community Question Answering (CQA) is a well-defined task that can be used in many scenarios, such as E-Commerce and online user community for special interests.
Most of the CQA methods only incorporate articles or Wikipedia to extract knowledge and answer the user's question.
We propose a question-aware heterogeneous graph transformer to incorporate the multiple information sources (MIS) in the user community to automatically generate the answer.
arXiv Detail & Related papers (2021-12-27T10:16:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.