EPD: Long-term Memory Extraction, Context-awared Planning and Multi-iteration Decision @ EgoPlan Challenge ICML 2024
- URL: http://arxiv.org/abs/2407.19510v1
- Date: Sun, 28 Jul 2024 15:14:07 GMT
- Title: EPD: Long-term Memory Extraction, Context-awared Planning and Multi-iteration Decision @ EgoPlan Challenge ICML 2024
- Authors: Letian Shi, Qi Lv, Xiang Deng, Liqiang Nie,
- Abstract summary: We introduce a novel planning framework which comprises three stages: long-term memory Extraction, context-awared Planning, and multi-iteration Decision, named EPD.
EPD achieves a planning accuracy of 53.85% over 1,584 egocentric task planning questions.
- Score: 50.89751993430737
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this technical report, we present our solution for the EgoPlan Challenge in ICML 2024. To address the real-world egocentric task planning problem, we introduce a novel planning framework which comprises three stages: long-term memory Extraction, context-awared Planning, and multi-iteration Decision, named EPD. Given the task goal, task progress, and current observation, the extraction model first extracts task-relevant memory information from the progress video, transforming the complex long video into summarized memory information. The planning model then combines the context of the memory information with fine-grained visual information from the current observation to predict the next action. Finally, through multi-iteration decision-making, the decision model comprehensively understands the task situation and current state to make the most realistic planning decision. On the EgoPlan-Test set, EPD achieves a planning accuracy of 53.85% over 1,584 egocentric task planning questions. We have made all codes available at https://github.com/Kkskkkskr/EPD .
Related papers
- ReLEP: A Novel Framework for Real-world Long-horizon Embodied Planning [7.668848364013772]
We present ReLEP, a framework for Real world Long-horizon Embodied Planning.
At its core lies a fine-tuned large vision language model that formulates plans as sequences of skill functions.
ReLEP can accomplish a wide range of daily tasks and outperforms other state-of-the-art baseline methods.
arXiv Detail & Related papers (2024-09-24T01:47:23Z) - Neural MP: A Generalist Neural Motion Planner [75.82675575009077]
We seek to do the same by applying data-driven learning at scale to the problem of motion planning.
Our approach builds a large number of complex scenes in simulation, collects expert data from a motion planner, then distills it into a reactive generalist policy.
We perform a thorough evaluation of our method on 64 motion planning tasks across four diverse environments.
arXiv Detail & Related papers (2024-09-09T17:59:45Z) - PDDLEGO: Iterative Planning in Textual Environments [56.12148805913657]
Planning in textual environments has been shown to be a long-standing challenge even for current models.
We propose PDDLEGO that iteratively construct a planning representation that can lead to a partial plan for a given sub-goal.
We show that plans produced by few-shot PDDLEGO are 43% more efficient than generating plans end-to-end on the Coin Collector simulation.
arXiv Detail & Related papers (2024-05-30T08:01:20Z) - Socratic Planner: Inquiry-Based Zero-Shot Planning for Embodied Instruction Following [17.608330952846075]
Embodied Instruction Following (EIF) is the task of executing natural language instructions by navigating and interacting with objects in 3D environments.
One of the primary challenges in EIF is compositional task planning, which is often addressed with supervised or in-context learning with labeled data.
We introduce the Socratic Planner, the first zero-shot planning method that infers without the need for any training data.
arXiv Detail & Related papers (2024-04-21T08:10:20Z) - AutoGPT+P: Affordance-based Task Planning with Large Language Models [6.848986296339031]
AutoGPT+P is a system that combines an affordance-based scene representation with a planning system.
Our approach achieves a success rate of 98%, surpassing the current 81% success rate of the current state-of-the-art LLM-based planning method SayCan.
arXiv Detail & Related papers (2024-02-16T16:00:50Z) - EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planning [84.6451394629312]
We introduce EgoPlan-Bench, a benchmark to evaluate the planning abilities of MLLMs in real-world scenarios.
We show that EgoPlan-Bench poses significant challenges, highlighting a substantial scope for improvement in MLLMs to achieve human-level task planning.
We also present EgoPlan-IT, a specialized instruction-tuning dataset that effectively enhances model performance on EgoPlan-Bench.
arXiv Detail & Related papers (2023-12-11T03:35:58Z) - Planning as In-Painting: A Diffusion-Based Embodied Task Planning
Framework for Environments under Uncertainty [56.30846158280031]
Task planning for embodied AI has been one of the most challenging problems.
We propose a task-agnostic method named 'planning as in-painting'
The proposed framework achieves promising performances in various embodied AI tasks.
arXiv Detail & Related papers (2023-12-02T10:07:17Z) - AVIS: Autonomous Visual Information Seeking with Large Language Model
Agent [123.75169211547149]
We propose an autonomous information seeking visual question answering framework, AVIS.
Our method leverages a Large Language Model (LLM) to dynamically strategize the utilization of external tools.
AVIS achieves state-of-the-art results on knowledge-intensive visual question answering benchmarks such as Infoseek and OK-VQA.
arXiv Detail & Related papers (2023-06-13T20:50:22Z) - A Framework for Neurosymbolic Robot Action Planning using Large Language Models [3.0501524254444767]
We present a framework aimed at bridging the gap between symbolic task planning and machine learning approaches.
The rationale is training Large Language Models (LLMs) into a neurosymbolic task planner compatible with the Planning Domain Definition Language (PDDL)
Preliminary results in selected domains show that our method can: (i) solve 95.5% of problems in a test data set of 1,000 samples; (ii) produce plans up to 13.5% shorter than a traditional symbolic planner; (iii) reduce average overall waiting times for a plan availability by up to 61.4%.
arXiv Detail & Related papers (2023-03-01T11:54:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.