Embodied Question Answering via Multi-LLM Systems
- URL: http://arxiv.org/abs/2406.10918v3
- Date: Tue, 25 Jun 2024 10:50:09 GMT
- Title: Embodied Question Answering via Multi-LLM Systems
- Authors: Bhrij Patel, Vishnu Sashank Dorbala, Dinesh Manocha, Amrit Singh Bedi,
- Abstract summary: Embodied Question Answering (EQA) is an important problem, which involves an agent exploring the environment to answer user queries.
We consider EQA in a multi-agent framework involving multiple large language models (LLM) based agents independently answering queries about a household environment.
- Score: 55.581423861790945
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Embodied Question Answering (EQA) is an important problem, which involves an agent exploring the environment to answer user queries. In the existing literature, EQA has exclusively been studied in single-agent scenarios, where exploration can be time-consuming and costly. In this work, we consider EQA in a multi-agent framework involving multiple large language models (LLM) based agents independently answering queries about a household environment. To generate one answer for each query, we use the individual responses to train a Central Answer Model (CAM) that aggregates responses for a robust answer. Using CAM, we observe a $50\%$ higher EQA accuracy when compared against aggregation methods for ensemble LLM, such as voting schemes and debates. CAM does not require any form of agent communication, alleviating it from the associated costs. We ablate CAM with various nonlinear (neural network, random forest, decision tree, XGBoost) and linear (logistic regression classifier, SVM) algorithms. Finally, we present a feature importance analysis for CAM via permutation feature importance (PFI), quantifying CAMs reliance on each independent agent and query context.
Related papers
- RAG-QA Arena: Evaluating Domain Robustness for Long-form Retrieval Augmented Question Answering [61.19126689470398]
Long-form RobustQA (LFRQA) is a new dataset covering 26K queries and large corpora across seven different domains.
We show via experiments that RAG-QA Arena and human judgments on answer quality are highly correlated.
Only 41.3% of the most competitive LLM's answers are preferred to LFRQA's answers, demonstrating RAG-QA Arena as a challenging evaluation platform for future research.
arXiv Detail & Related papers (2024-07-19T03:02:51Z) - S-EQA: Tackling Situational Queries in Embodied Question Answering [48.43453390717167]
We present and tackle the problem of Embodied Question Answering with Situational Queries (S-EQA) in a household environment.
We first introduce a novel Prompt-Generate-Evaluate scheme that wraps around an LLM's output to create a dataset of unique situational queries.
We validate the generated dataset via a large scale user-study conducted on M-Turk, and introduce it as S-EQA, the first dataset tackling EQA with situational queries.
arXiv Detail & Related papers (2024-05-08T00:45:20Z) - Chain-of-Discussion: A Multi-Model Framework for Complex Evidence-Based
Question Answering [62.14682452663157]
We propose a novel Chain-of-Discussion framework to leverage the synergy among open-source Large Language Models.
Our experiments show that discussions among multiple LLMs play a vital role in enhancing the quality of answers.
arXiv Detail & Related papers (2024-02-26T05:31:34Z) - An Empirical Comparison of LM-based Question and Answer Generation
Methods [79.31199020420827]
Question and answer generation (QAG) consists of generating a set of question-answer pairs given a context.
In this paper, we establish baselines with three different QAG methodologies that leverage sequence-to-sequence language model (LM) fine-tuning.
Experiments show that an end-to-end QAG model, which is computationally light at both training and inference times, is generally robust and outperforms other more convoluted approaches.
arXiv Detail & Related papers (2023-05-26T14:59:53Z) - Federated Prompting and Chain-of-Thought Reasoning for Improving LLMs
Answering [13.735277588793997]
We investigate how to enhance answer precision in frequently asked questions posed by distributed users using cloud-based Large Language Models (LLMs)
Our study focuses on a typical situations where users ask similar queries that involve identical mathematical reasoning steps and problem-solving procedures.
We propose to improve the distributed synonymous questions using Self-Consistency (SC) and Chain-of-Thought (CoT) techniques.
arXiv Detail & Related papers (2023-04-27T01:48:03Z) - Answering Questions by Meta-Reasoning over Multiple Chains of Thought [56.74935116310892]
We introduce Multi-Chain Reasoning (MCR), an approach which prompts large language models to meta-reason over multiple chains of thought.
MCR examines different reasoning chains, mixes information between them and selects the most relevant facts in generating an explanation and predicting the answer.
arXiv Detail & Related papers (2023-04-25T17:27:37Z) - Activity report analysis with automatic single or multispan answer
extraction [0.21485350418225244]
We create a new smart home environment dataset comprised of questions paired with single-span or multi-span answers depending on the question and context queried.
Our experiments show that the proposed model outperforms state-of-the-art QA models on our dataset.
arXiv Detail & Related papers (2022-09-09T06:33:29Z) - OneStop QAMaker: Extract Question-Answer Pairs from Text in a One-Stop
Approach [11.057028572260064]
We propose a model named OneStop to generate QA pairs from documents in a one-stop approach.
Specifically, questions and their corresponding answer span is extracted simultaneously.
OneStop is much more efficient to be trained and deployed in industrial scenarios since it involves only one model to solve the complex QA generation task.
arXiv Detail & Related papers (2021-02-24T08:45:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.