Universal Visual Decomposer: Long-Horizon Manipulation Made Easy
        - URL: http://arxiv.org/abs/2310.08581v1
- Date: Thu, 12 Oct 2023 17:59:41 GMT
- Title: Universal Visual Decomposer: Long-Horizon Manipulation Made Easy
- Authors: Zichen Zhang, Yunshuang Li, Osbert Bastani, Abhishek Gupta, Dinesh
  Jayaraman, Yecheng Jason Ma, Luca Weihs
- Abstract summary: Real-world robotic tasks stretch over extended horizons and encompass multiple stages.
Prior task decomposition methods require task-specific knowledge, are computationally intensive, and cannot readily be applied to new tasks.
We propose Universal Visual Decomposer (UVD), an off-the-shelf task decomposition method for visual long horizon manipulation.
We extensively evaluate UVD on both simulation and real-world tasks, and in all cases, UVD substantially outperforms baselines across imitation and reinforcement learning settings.
- Score: 54.93745986073738
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract:   Real-world robotic tasks stretch over extended horizons and encompass
multiple stages. Learning long-horizon manipulation tasks, however, is a
long-standing challenge, and demands decomposing the overarching task into
several manageable subtasks to facilitate policy learning and generalization to
unseen tasks. Prior task decomposition methods require task-specific knowledge,
are computationally intensive, and cannot readily be applied to new tasks. To
address these shortcomings, we propose Universal Visual Decomposer (UVD), an
off-the-shelf task decomposition method for visual long horizon manipulation
using pre-trained visual representations designed for robotic control. At a
high level, UVD discovers subgoals by detecting phase shifts in the embedding
space of the pre-trained representation. Operating purely on visual
demonstrations without auxiliary information, UVD can effectively extract
visual subgoals embedded in the videos, while incurring zero additional
training cost on top of standard visuomotor policy training. Goal-conditioned
policies learned with UVD-discovered subgoals exhibit significantly improved
compositional generalization at test time to unseen tasks. Furthermore,
UVD-discovered subgoals can be used to construct goal-based reward shaping that
jump-starts temporally extended exploration for reinforcement learning. We
extensively evaluate UVD on both simulation and real-world tasks, and in all
cases, UVD substantially outperforms baselines across imitation and
reinforcement learning settings on in-domain and out-of-domain task sequences
alike, validating the clear advantage of automated visual task decomposition
within the simple, compact UVD framework.
 
      
        Related papers
        - T-Rex: Task-Adaptive Spatial Representation Extraction for Robotic   Manipulation with Vision-Language Models [35.83717913117858]
 We introduce T-Rex, a Task-Adaptive Framework for Spatial Representation Extraction.<n>We show that our approach delivers significant advantages in spatial understanding, efficiency, and stability without additional training.
 arXiv  Detail & Related papers  (2025-06-24T10:36:15Z)
- Object-Focus Actor for Data-efficient Robot Generalization Dexterous   Manipulation [14.977743061489518]
 We introduce Object-Focus Actor (OFA), a novel, data-efficient approach for generalized dexterous manipulation.<n>OFA exploits the consistent end trajectories observed in dexterous manipulation tasks, allowing for efficient policy training.<n>OFA achieves robust performance with only 10 demonstrations, highlighting its data efficiency.
 arXiv  Detail & Related papers  (2025-05-21T04:37:56Z)
- Vision Language Models are In-Context Value Learners [89.29486557646624]
 We present Generative Value Learning (GVL), a universal value function estimator that leverages the world knowledge embedded in vision-language models (VLMs) to predict task progress.
Without any robot or task specific training, GVL can in-context zero-shot and few-shot predict effective values for more than 300 distinct real-world tasks.
 arXiv  Detail & Related papers  (2024-11-07T09:17:50Z)
- Efficient Learning of High Level Plans from Play [57.29562823883257]
 We present Efficient Learning of High-Level Plans from Play (ELF-P), a framework for robotic learning that bridges motion planning and deep RL.
We demonstrate that ELF-P has significantly better sample efficiency than relevant baselines over multiple realistic manipulation tasks.
 arXiv  Detail & Related papers  (2023-03-16T20:09:47Z)
- Generalization with Lossy Affordances: Leveraging Broad Offline Data for
  Learning Visuomotor Tasks [65.23947618404046]
 We introduce a framework that acquires goal-conditioned policies for unseen temporally extended tasks via offline reinforcement learning on broad data.
When faced with a novel task goal, the framework uses an affordance model to plan a sequence of lossy representations as subgoals that decomposes the original task into easier problems.
We show that our framework can be pre-trained on large-scale datasets of robot experiences from prior work and efficiently fine-tuned for novel tasks, entirely from visual inputs without any manual reward engineering.
 arXiv  Detail & Related papers  (2022-10-12T21:46:38Z)
- Deep Hierarchical Planning from Pixels [86.14687388689204]
 Director is a method for learning hierarchical behaviors directly from pixels by planning inside the latent space of a learned world model.
Despite operating in latent space, the decisions are interpretable because the world model can decode goals into images for visualization.
Director also learns successful behaviors across a wide range of environments, including visual control, Atari games, and DMLab levels.
 arXiv  Detail & Related papers  (2022-06-08T18:20:15Z)
- Generalizing to New Tasks via One-Shot Compositional Subgoals [23.15624959305799]
 The ability to generalize to previously unseen tasks with little to no supervision is a key challenge in modern machine learning research.
We introduce CASE which attempts to address these issues by training an Imitation Learning agent using adaptive "near future" subgoals.
Our experiments show that the proposed approach consistently outperforms the previous state-of-the-art compositional Imitation Learning approach by 30%.
 arXiv  Detail & Related papers  (2022-05-16T14:30:11Z)
- Hierarchical Few-Shot Imitation with Skill Transition Models [66.81252581083199]
 Few-shot Imitation with Skill Transition Models (FIST) is an algorithm that extracts skills from offline data and utilizes them to generalize to unseen tasks.
We show that FIST is capable of generalizing to new tasks and substantially outperforms prior baselines in navigation experiments.
 arXiv  Detail & Related papers  (2021-07-19T15:56:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.