S-EQA: Tackling Situational Queries in Embodied Question Answering
- URL: http://arxiv.org/abs/2405.04732v2
- Date: Fri, 25 Oct 2024 20:43:33 GMT
- Title: S-EQA: Tackling Situational Queries in Embodied Question Answering
- Authors: Vishnu Sashank Dorbala, Prasoon Goyal, Robinson Piramuthu, Michael Johnston, Reza Ghanadhan, Dinesh Manocha,
- Abstract summary: We present and tackle the problem of Embodied Question Answering with Situational Queries (S-EQA) in a household environment.
We first introduce a novel Prompt-Generate-Evaluate scheme that wraps around an LLM's output to create a dataset of unique situational queries and corresponding consensus object information.
We report an improved accuracy of 15.31% while using queries framed from the generated object consensus for Visual Question Answering (VQA) over directly answering situational ones.
- Score: 48.43453390717167
- License:
- Abstract: We present and tackle the problem of Embodied Question Answering (EQA) with Situational Queries (S-EQA) in a household environment. Unlike prior EQA work tackling simple queries that directly reference target objects and properties ("What is the color of the car?"), situational queries (such as "Is the house ready for sleeptime?") are more challenging requiring the agent to identify multiple objects (Doors: Closed, Lights: Off, etc.) and reach a consensus on their states for an answer. Towards this objective, we first introduce a novel Prompt-Generate-Evaluate (PGE) scheme that wraps around an LLM's output to create a dataset of unique situational queries and corresponding consensus object information. PGE maintains uniqueness among the generated queries, using semantic similarity via a feedback loop. We annotate the generated data for ground truth answers via a large scale user-study conducted on M-Turk, and with a high answerability rate of 97.26%, establish that LLMs are good at generating situational data. However, using the same LLM to answer the queries gives a low success rate of 46.2%; indicating that while LLMs are good at generating query data, they are poor at answering them. We use images from the VirtualHome simulator with the S-EQA queries establish an evaluation benchmark via Visual Question Answering (VQA). We report an improved accuracy of 15.31% while using queries framed from the generated object consensus for VQA over directly answering situational ones, indicating that such simplification is necessary for improved performance. To the best of our knowledge, this is the first work to introduce EQA in the context of situational queries that also uses a generative approach for query creation. We aim to foster research on improving the real-world usability of embodied agents in household environments through this work.
Related papers
- Trust but Verify: Programmatic VLM Evaluation in the Wild [62.14071929143684]
Programmatic VLM Evaluation (PROVE) is a new benchmarking paradigm for evaluating VLM responses to open-ended queries.
We benchmark the helpfulness-truthfulness trade-offs of a range ofVLMs on PROVE, finding that very few are in-fact able to achieve a good balance between the two.
arXiv Detail & Related papers (2024-10-17T01:19:18Z) - DEXTER: A Benchmark for open-domain Complex Question Answering using LLMs [3.24692739098077]
Open-domain complex Question Answering (QA) is a difficult task with challenges in evidence retrieval and reasoning.
We evaluate state-of-the-art pre-trained dense and sparse retrieval models in an open-domain setting.
We observe that late interaction models and surprisingly lexical models like BM25 perform well compared to other pre-trained dense retrieval models.
arXiv Detail & Related papers (2024-06-24T22:09:50Z) - Multi-LLM QA with Embodied Exploration [55.581423861790945]
We investigate the use of Multi-Embodied LLM Explorers (MELE) for question-answering in an unknown environment.
Multiple LLM-based agents independently explore and then answer queries about a household environment.
We analyze different aggregation methods to generate a single, final answer for each query.
arXiv Detail & Related papers (2024-06-16T12:46:40Z) - Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity [59.57065228857247]
Retrieval-augmented Large Language Models (LLMs) have emerged as a promising approach to enhancing response accuracy in several tasks, such as Question-Answering (QA)
We propose a novel adaptive QA framework, that can dynamically select the most suitable strategy for (retrieval-augmented) LLMs based on the query complexity.
We validate our model on a set of open-domain QA datasets, covering multiple query complexities, and show that ours enhances the overall efficiency and accuracy of QA systems.
arXiv Detail & Related papers (2024-03-21T13:52:30Z) - Allies: Prompting Large Language Model with Beam Search [107.38790111856761]
In this work, we propose a novel method called ALLIES.
Given an input query, ALLIES leverages LLMs to iteratively generate new queries related to the original query.
By iteratively refining and expanding the scope of the original query, ALLIES captures and utilizes hidden knowledge that may not be directly through retrieval.
arXiv Detail & Related papers (2023-05-24T06:16:44Z) - Federated Prompting and Chain-of-Thought Reasoning for Improving LLMs
Answering [13.735277588793997]
We investigate how to enhance answer precision in frequently asked questions posed by distributed users using cloud-based Large Language Models (LLMs)
Our study focuses on a typical situations where users ask similar queries that involve identical mathematical reasoning steps and problem-solving procedures.
We propose to improve the distributed synonymous questions using Self-Consistency (SC) and Chain-of-Thought (CoT) techniques.
arXiv Detail & Related papers (2023-04-27T01:48:03Z) - CQE in OWL 2 QL: A "Longest Honeymoon" Approach (extended version) [13.169982133542266]
We study a dynamic CQE method, namely, we propose to alter the answer to the current query based on the evaluation of previous ones.
We aim at a system that, besides being able to protect confidential data, is maximally cooperative.
arXiv Detail & Related papers (2022-07-22T15:51:15Z) - Generating Diverse and Consistent QA pairs from Contexts with
Information-Maximizing Hierarchical Conditional VAEs [62.71505254770827]
We propose a conditional variational autoencoder (HCVAE) for generating QA pairs given unstructured texts as contexts.
Our model obtains impressive performance gains over all baselines on both tasks, using only a fraction of data for training.
arXiv Detail & Related papers (2020-05-28T08:26:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.