Related papers: Skip-Plan: Procedure Planning in Instructional Videos via Condensed Action Space Learning

Skip-Plan: Procedure Planning in Instructional Videos via Condensed Action Space Learning

URL: http://arxiv.org/abs/2310.00608v1
Date: Sun, 1 Oct 2023 08:02:33 GMT
Title: Skip-Plan: Procedure Planning in Instructional Videos via Condensed Action Space Learning
Authors: Zhiheng Li, Wenjia Geng, Muheng Li, Lei Chen, Yansong Tang, Jiwen Lu, Jie Zhou
Abstract summary: Skip-Plan is a condensed action space learning method for procedure planning in instructional videos. By skipping uncertain nodes and edges in action chains, we transfer long and complex sequence functions into short but reliable ones. Our model explores all sorts of reliable sub-relations within an action sequence in the condensed action space.
Score: 85.84504287685884
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we propose Skip-Plan, a condensed action space learning method for procedure planning in instructional videos. Current procedure planning methods all stick to the state-action pair prediction at every timestep and generate actions adjacently. Although it coincides with human intuition, such a methodology consistently struggles with high-dimensional state supervision and error accumulation on action sequences. In this work, we abstract the procedure planning problem as a mathematical chain model. By skipping uncertain nodes and edges in action chains, we transfer long and complex sequence functions into short but reliable ones in two ways. First, we skip all the intermediate state supervision and only focus on action predictions. Second, we decompose relatively long chains into multiple short sub-chains by skipping unreliable intermediate actions. By this means, our model explores all sorts of reliable sub-relations within an action sequence in the condensed action space. Extensive experiments show Skip-Plan achieves state-of-the-art performance on the CrossTask and COIN benchmarks for procedure planning.

Related papers

Masked Temporal Interpolation Diffusion for Procedure Planning in Instructional Videos [32.71627274876863]
We address the challenge of procedure planning in instructional videos, aiming to generate coherent and task-aligned action sequences from start and end visual observations.<n>Previous work has mainly relied on text-level supervision to bridge the gap between observed states and unobserved actions, but it struggles with capturing intricate temporal relationships among actions.<n>We propose the Masked Temporal Interpolation Diffusion Diffusion model that introduces a latent space temporal temporal module within the diffusion model.
arXiv Detail & Related papers (2025-07-04T08:54:59Z)
Efficient Robotic Policy Learning via Latent Space Backward Planning [17.770562202624962]
Current robotic planning methods often rely on predicting multi-frame images with full pixel details.<n>We propose a Latent Space Backward Planning scheme (LBP), which begins by grounding the task into final latent goals.<n>We show that LBP outperforms existing fine-grained and forward planning methods, achieving SOTA performance.
arXiv Detail & Related papers (2025-05-11T06:13:51Z)
Hindsight Planner: A Closed-Loop Few-Shot Planner for Embodied Instruction Following [62.10809033451526]
This work focuses on building a task planner for Embodied Instruction Following (EIF) using Large Language Models (LLMs) We frame the task as a Partially Observable Markov Decision Process (POMDP) and aim to develop a robust planner under a few-shot assumption. Our experiments on the ALFRED dataset indicate that our planner achieves competitive performance under a few-shot assumption.
arXiv Detail & Related papers (2024-12-27T10:05:45Z)
GenPlan: Generative Sequence Models as Adaptive Planners [0.0]
Sequence models have demonstrated remarkable success in behavioral planning by leveraging previously collected demonstrations. However, solving multi-task missions remains a significant challenge, particularly when the planner must adapt to unseen constraints and tasks. We propose GenPlan: a discrete-flow model for adaptive planner, enabling sample-generative exploration and exploitation.
arXiv Detail & Related papers (2024-12-11T17:32:33Z)
Closed-Loop Long-Horizon Robotic Planning via Equilibrium Sequence Modeling [23.62433580021779]
We advocate a self-refining scheme that iteratively refines a draft plan until an equilibrium is reached. A nested equilibrium sequence modeling procedure is devised for efficient closed-loop planning. Our method is evaluated on the VirtualHome-Env benchmark, showing advanced performance with better scaling for inference.
arXiv Detail & Related papers (2024-10-02T11:42:49Z)
BiKC: Keypose-Conditioned Consistency Policy for Bimanual Robotic Manipulation [48.08416841005715]
We introduce a novel keypose-conditioned consistency policy tailored for bimanual manipulation. It is a hierarchical imitation learning framework that consists of a high-level keypose predictor and a low-level trajectory generator. Simulated and real-world experimental results demonstrate that the proposed approach surpasses baseline methods in terms of success rate and operational efficiency.
arXiv Detail & Related papers (2024-06-14T14:49:12Z)
Task and Motion Planning for Execution in the Real [24.01204729304763]
This work generates task and motion plans that include actions cannot be fully grounded at planning time. Execution combines offline planned motions and online behaviors till reaching the task goal. Forty real-robot trials and motivating demonstrations are performed to evaluate the proposed framework. Results show faster execution time, less number of actions, and more success in problems where diverse gaps arise.
arXiv Detail & Related papers (2024-06-05T22:30:40Z)
RAP: Retrieval-Augmented Planner for Adaptive Procedure Planning in Instructional Videos [46.26690150997731]
We propose a new and practical setting, called adaptive procedure planning in instructional videos. RAP adaptively determines the conclusion of actions using an auto-regressive model architecture.
arXiv Detail & Related papers (2024-03-27T14:22:40Z)
TwoStep: Multi-agent Task Planning using Classical Planners and Large Language Models [7.653791106386385]
Two-agent planning goal decomposition leads to faster planning times than solving multi-agent PDDL problems directly. We find that LLM-based approximations of subgoals can achieve similar multi-agent execution steps than those specified by human experts.
arXiv Detail & Related papers (2024-03-25T22:47:13Z)
Planning as In-Painting: A Diffusion-Based Embodied Task Planning Framework for Environments under Uncertainty [56.30846158280031]
Task planning for embodied AI has been one of the most challenging problems. We propose a task-agnostic method named 'planning as in-painting' The proposed framework achieves promising performances in various embodied AI tasks.
arXiv Detail & Related papers (2023-12-02T10:07:17Z)
AI planning in the imagination: High-level planning on learned abstract search spaces [68.75684174531962]
We propose a new method, called PiZero, that gives an agent the ability to plan in an abstract search space that the agent learns during training. We evaluate our method on multiple domains, including the traveling salesman problem, Sokoban, 2048, the facility location problem, and Pacman.
arXiv Detail & Related papers (2023-08-16T22:47:16Z)
Contingencies from Observations: Tractable Contingency Planning with Learned Behavior Models [82.34305824719101]
Humans have a remarkable ability to make decisions by accurately reasoning about future events. We develop a general-purpose contingency planner that is learned end-to-end using high-dimensional scene observations. We show how this model can tractably learn contingencies from behavioral observations.
arXiv Detail & Related papers (2021-04-21T14:30:20Z)
STRIPS Action Discovery [67.73368413278631]
Recent approaches have shown the success of classical planning at synthesizing action models even when all intermediate states are missing. We propose a new algorithm to unsupervisedly synthesize STRIPS action models with a classical planner when action signatures are unknown.
arXiv Detail & Related papers (2020-01-30T17:08:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.