Beyond the Destination: A Novel Benchmark for Exploration-Aware Embodied Question Answering
- URL: http://arxiv.org/abs/2503.11117v2
- Date: Wed, 19 Mar 2025 06:56:19 GMT
- Title: Beyond the Destination: A Novel Benchmark for Exploration-Aware Embodied Question Answering
- Authors: Kaixuan Jiang, Yang Liu, Weixing Chen, Jingzhou Luo, Ziliang Chen, Ling Pan, Guanbin Li, Liang Lin,
- Abstract summary: Embodied Question Answering requires agents to dynamically explore 3D environments, actively gather visual information, and perform multi-step reasoning to answer questions.<n>Existing datasets often introduce biases or prior knowledge, leading to disembodied reasoning.<n>We construct the largest dataset designed specifically to evaluate both exploration and reasoning capabilities.
- Score: 87.76784654371312
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Embodied Question Answering (EQA) is a challenging task in embodied intelligence that requires agents to dynamically explore 3D environments, actively gather visual information, and perform multi-step reasoning to answer questions. However, current EQA approaches suffer from critical limitations in exploration efficiency, dataset design, and evaluation metrics. Moreover, existing datasets often introduce biases or prior knowledge, leading to disembodied reasoning, while frontier-based exploration strategies struggle in cluttered environments and fail to ensure fine-grained exploration of task-relevant areas. To address these challenges, we construct the EXPloration-awaRe Embodied queStion anSwering Benchmark (EXPRESS-Bench), the largest dataset designed specifically to evaluate both exploration and reasoning capabilities. EXPRESS-Bench consists of 777 exploration trajectories and 2,044 question-trajectory pairs. To improve exploration efficiency, we propose Fine-EQA, a hybrid exploration model that integrates frontier-based and goal-oriented navigation to guide agents toward task-relevant regions more effectively. Additionally, we introduce a novel evaluation metric, Exploration-Answer Consistency (EAC), which ensures faithful assessment by measuring the alignment between answer grounding and exploration reliability. Extensive experimental comparisons with state-of-the-art EQA models demonstrate the effectiveness of our EXPRESS-Bench in advancing embodied exploration and question reasoning.
Related papers
- An Information-Geometric Approach to Artificial Curiosity [49.1574468325115]
We show that intrinsic rewards should depend on the agent's information about the environment, remaining to the representation of the information.
We show that invariance under congruent Markov morphisms and the agent-environment interaction, uniquely constrains intrinsic rewards to concave functions of the reciprocal occupancy.
This framework provides important constraints to the engineering of intrinsic reward while integrating foundational exploration methods into a single, cohesive model.
arXiv Detail & Related papers (2025-04-08T18:04:15Z) - Divide-and-Conquer: Tree-structured Strategy with Answer Distribution Estimator for Goal-Oriented Visual Dialogue [30.126882554391837]
Tree-Structured Strategy with Answer Distribution Estimator (TSADE)
We propose a Tree-Structured Strategy with Answer Distribution Estimator (TSADE) which guides the question generation by excluding half of the current candidate objects in each round.
We experimentally demonstrate that our method can enable the agents to achieve high task-oriented accuracy with fewer repeating questions and rounds compared to traditional ergodic question generation approaches.
arXiv Detail & Related papers (2025-02-09T08:16:09Z) - EfficientEQA: An Efficient Approach for Open Vocabulary Embodied Question Answering [21.114403949257934]
Embodied Question Answering (EQA) is an essential yet challenging task for robotic home assistants.
Recent studies have shown that large vision-language models (VLMs) can be effectively utilized for EQA, but existing works either focus on video-based question answering or rely on closed-form choice sets.
We propose a novel framework called EfficientEQA for open-vocabulary EQA, which enables efficient exploration and accurate answering.
arXiv Detail & Related papers (2024-10-26T19:48:47Z) - Random Latent Exploration for Deep Reinforcement Learning [71.88709402926415]
We introduce Random Latent Exploration (RLE), a simple yet effective exploration strategy in reinforcement learning (RL)<n>On average, RLE outperforms noise-based methods, which perturb the agent's actions, and bonus-based exploration, which rewards the agent for attempting novel behaviors.<n>RLE is as simple as noise-based methods, as it avoids complex bonus calculations but retains the deep exploration benefits of bonus-based methods.
arXiv Detail & Related papers (2024-07-18T17:55:22Z) - DISCOVERYWORLD: A Virtual Environment for Developing and Evaluating Automated Scientific Discovery Agents [49.74065769505137]
We introduce DISCOVERYWORLD, the first virtual environment for developing and benchmarking an agent's ability to perform complete cycles of novel scientific discovery.
It includes 120 different challenge tasks spanning eight topics each with three levels of difficulty and several parametric variations.
We find that strong baseline agents, that perform well in prior published environments, struggle on most DISCOVERYWORLD tasks.
arXiv Detail & Related papers (2024-06-10T20:08:44Z) - WESE: Weak Exploration to Strong Exploitation for LLM Agents [95.6720931773781]
This paper proposes a novel approach, Weak Exploration to Strong Exploitation (WESE) to enhance LLM agents in solving open-world interactive tasks.
WESE involves decoupling the exploration and exploitation process, employing a cost-effective weak agent to perform exploration tasks for global knowledge.
A knowledge graph-based strategy is then introduced to store the acquired knowledge and extract task-relevant knowledge, enhancing the stronger agent in success rate and efficiency for the exploitation task.
arXiv Detail & Related papers (2024-04-11T03:31:54Z) - Collaborative Knowledge Infusion for Low-resource Stance Detection [83.88515573352795]
Target-related knowledge is often needed to assist stance detection models.
We propose a collaborative knowledge infusion approach for low-resource stance detection tasks.
arXiv Detail & Related papers (2024-03-28T08:32:14Z) - Explore until Confident: Efficient Exploration for Embodied Question Answering [32.27111287314288]
We leverage the strong semantic reasoning capabilities of large vision-language models to efficiently explore and answer questions.
We propose a method that first builds a semantic map of the scene based on depth information and via visual prompting of a VLM.
Next, we use conformal prediction to calibrate the VLM's question answering confidence, allowing the robot to know when to stop exploration.
arXiv Detail & Related papers (2024-03-23T22:04:03Z) - ELBA: Learning by Asking for Embodied Visual Navigation and Task Completion [11.068806371839424]
We propose an Embodied Learning-By-Asking (ELBA) model that learns when and what questions to ask to dynamically acquire additional information for completing the task.<n> Experimental results show that the proposed method achieves improved task performance compared to baseline models without question-answering capabilities.
arXiv Detail & Related papers (2023-02-09T18:59:41Z) - Batch Exploration with Examples for Scalable Robotic Reinforcement
Learning [63.552788688544254]
Batch Exploration with Examples (BEE) explores relevant regions of the state-space guided by a modest number of human provided images of important states.
BEE is able to tackle challenging vision-based manipulation tasks both in simulation and on a real Franka robot.
arXiv Detail & Related papers (2020-10-22T17:49:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.