HAPFI: History-Aware Planning based on Fused Information
- URL: http://arxiv.org/abs/2407.16533v1
- Date: Tue, 23 Jul 2024 14:46:07 GMT
- Title: HAPFI: History-Aware Planning based on Fused Information
- Authors: Sujin Jeon, Suyeon Shin, Byoung-Tak Zhang,
- Abstract summary: Embodied Instruction Following (EIF) is a task of planning a long sequence of sub-goals given high-level natural language instructions.
We argue that an agent must consider its past, i.e., historical data, when making decisions in each step.
- Score: 18.141893873543037
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Embodied Instruction Following (EIF) is a task of planning a long sequence of sub-goals given high-level natural language instructions, such as "Rinse a slice of lettuce and place on the white table next to the fork". To successfully execute these long-term horizon tasks, we argue that an agent must consider its past, i.e., historical data, when making decisions in each step. Nevertheless, recent approaches in EIF often neglects the knowledge from historical data and also do not effectively utilize information across the modalities. To this end, we propose History-Aware Planning based on Fused Information (HAPFI), effectively leveraging the historical data from diverse modalities that agents collect while interacting with the environment. Specifically, HAPFI integrates multiple modalities, including historical RGB observations, bounding boxes, sub-goals, and high-level instructions, by effectively fusing modalities via our Mutually Attentive Fusion method. Through experiments with diverse comparisons, we show that an agent utilizing historical multi-modal information surpasses all the compared methods that neglect the historical data in terms of action planning capability, enabling the generation of well-informed action plans for the next step. Moreover, we provided qualitative evidence highlighting the significance of leveraging historical multi-modal data, particularly in scenarios where the agent encounters intermediate failures, showcasing its robust re-planning capabilities.
Related papers
- Bridging Past and Future: End-to-End Autonomous Driving with Historical Prediction and Planning [12.182364509599228]
End-to-end autonomous driving unifies tasks in a differentiable framework, enabling planning-oriented optimization and attracting growing attention.
We propose BridgeAD, which reformulates motion and planning queries as multi-step queries to differentiate the queries for each future time step.
This design enables the effective use of historical prediction and planning by applying them to the appropriate parts of the end-to-end system based on the time steps, which improves both perception and motion planning.
arXiv Detail & Related papers (2025-03-18T11:57:31Z) - LHPF: Look back the History and Plan for the Future in Autonomous Driving [10.855426442780516]
This paper introduces LHPF, an imitation learning planner that integrates historical planning information.
Our approach employs a historical intention aggregation module that pools historical planning intentions.
Experiments using both real-world and synthetic data demonstrate that LHPF not only surpasses existing advanced learning-based planners in planning performance but also marks the first instance of a purely learning-based planner outperforming the expert.
arXiv Detail & Related papers (2024-11-26T09:30:26Z) - Spatial Reasoning and Planning for Deep Embodied Agents [2.7195102129095003]
This thesis explores the development of data-driven techniques for spatial reasoning and planning tasks.
It focuses on enhancing learning efficiency, interpretability, and transferability across novel scenarios.
arXiv Detail & Related papers (2024-09-28T23:05:56Z) - Long-horizon Embodied Planning with Implicit Logical Inference and Hallucination Mitigation [7.668848364013772]
We present ReLEP, a novel framework for Real-time Long-horizon Embodied Planning.
ReLEP can complete a wide range of long-horizon tasks without in-context examples by learning implicit logical inference through fine-tuning.
arXiv Detail & Related papers (2024-09-24T01:47:23Z) - P-RAG: Progressive Retrieval Augmented Generation For Planning on Embodied Everyday Task [94.08478298711789]
Embodied Everyday Task is a popular task in the embodied AI community.
Natural language instructions often lack explicit task planning.
Extensive training is required to equip models with knowledge of the task environment.
arXiv Detail & Related papers (2024-09-17T15:29:34Z) - TransferTOD: A Generalizable Chinese Multi-Domain Task-Oriented Dialogue System with Transfer Capabilities [46.91749457402889]
Task-oriented dialogue (TOD) systems aim to efficiently handle task-oriented conversations, including information collection.
How to utilize TOD accurately, efficiently and effectively for information collection has always been a critical and challenging task.
Recent studies have demonstrated that Large Language Models (LLMs) excel in dialogue, instruction generation, and reasoning.
arXiv Detail & Related papers (2024-07-31T15:38:15Z) - Retrieve-Plan-Generation: An Iterative Planning and Answering Framework for Knowledge-Intensive LLM Generation [47.22520829950929]
We propose the Retrieve-Plan-Generation (RPG) framework for large language models (LLMs)
RPG generates plan tokens to guide subsequent generation in the plan stage.
In the answer stage, the model selects relevant fine-grained paragraphs based on the plan and uses them for further answer generation.
arXiv Detail & Related papers (2024-06-21T08:45:52Z) - FlowBench: Revisiting and Benchmarking Workflow-Guided Planning for LLM-based Agents [64.1759086221016]
We present FlowBench, the first benchmark for workflow-guided planning.
FlowBench covers 51 different scenarios from 6 domains, with knowledge presented in diverse formats.
Results indicate that current LLM agents need considerable improvements for satisfactory planning.
arXiv Detail & Related papers (2024-06-21T06:13:00Z) - Learning to Plan for Retrieval-Augmented Large Language Models from Knowledge Graphs [59.76268575344119]
We introduce a novel framework for enhancing large language models' (LLMs) planning capabilities by using planning data derived from knowledge graphs (KGs)
LLMs fine-tuned with KG data have improved planning capabilities, better equipping them to handle complex QA tasks that involve retrieval.
arXiv Detail & Related papers (2024-06-20T13:07:38Z) - ActiveAD: Planning-Oriented Active Learning for End-to-End Autonomous
Driving [96.92499034935466]
End-to-end differentiable learning for autonomous driving has recently become a prominent paradigm.
One main bottleneck lies in its voracious appetite for high-quality labeled data.
We propose a planning-oriented active learning method which progressively annotates part of collected raw data.
arXiv Detail & Related papers (2024-03-05T11:39:07Z) - Exploring the Limits of Historical Information for Temporal Knowledge
Graph Extrapolation [59.417443739208146]
We propose a new event forecasting model based on a novel training framework of historical contrastive learning.
CENET learns both the historical and non-historical dependency to distinguish the most potential entities.
We evaluate our proposed model on five benchmark graphs.
arXiv Detail & Related papers (2023-08-29T03:26:38Z) - Reinforcement Learning with History-Dependent Dynamic Contexts [29.8131459650617]
We introduce Dynamic Contextual Markov Decision Processes (DCMDPs), a novel reinforcement learning framework for history-dependent environments.
We consider special cases of the model, with a focus on logistic DCMDPs, which break the exponential dependence on history length by leveraging aggregation functions to determine context transitions.
Motivated by our theoretical results, we introduce a practical model-based algorithm for logistic DCMDPs that plans in a latent space and uses optimism over history-dependent features.
arXiv Detail & Related papers (2023-02-04T01:58:21Z) - Detecting Ongoing Events Using Contextual Word and Sentence Embeddings [110.83289076967895]
This paper introduces the Ongoing Event Detection (OED) task.
The goal is to detect ongoing event mentions only, as opposed to historical, future, hypothetical, or other forms or events that are neither fresh nor current.
Any application that needs to extract structured information about ongoing events from unstructured texts can take advantage of an OED system.
arXiv Detail & Related papers (2020-07-02T20:44:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.