Decomposing the Generalization Gap in Imitation Learning for Visual
Robotic Manipulation
- URL: http://arxiv.org/abs/2307.03659v1
- Date: Fri, 7 Jul 2023 15:26:03 GMT
- Title: Decomposing the Generalization Gap in Imitation Learning for Visual
Robotic Manipulation
- Authors: Annie Xie, Lisa Lee, Ted Xiao, Chelsea Finn
- Abstract summary: We study imitation learning policies in simulation and on a real robot language-conditioned manipulation task.
We design a new simulated benchmark of 19 tasks with 11 factors of variation to facilitate more controlled evaluations of generalization.
- Score: 60.00649221656642
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: What makes generalization hard for imitation learning in visual robotic
manipulation? This question is difficult to approach at face value, but the
environment from the perspective of a robot can often be decomposed into
enumerable factors of variation, such as the lighting conditions or the
placement of the camera. Empirically, generalization to some of these factors
have presented a greater obstacle than others, but existing work sheds little
light on precisely how much each factor contributes to the generalization gap.
Towards an answer to this question, we study imitation learning policies in
simulation and on a real robot language-conditioned manipulation task to
quantify the difficulty of generalization to different (sets of) factors. We
also design a new simulated benchmark of 19 tasks with 11 factors of variation
to facilitate more controlled evaluations of generalization. From our study, we
determine an ordering of factors based on generalization difficulty, that is
consistent across simulation and our real robot setup.
Related papers
- Improving Generalization Ability of Robotic Imitation Learning by Resolving Causal Confusion in Observations [3.2389875818890124]
We propose a causal structure learning framework that can be easily embedded in recent imitation learning architectures.<n>We demonstrate our approach using a simulation of the ALOHA [31] bimanual robot arms in Mujoco.
arXiv Detail & Related papers (2025-07-30T04:46:48Z) - RoBridge: A Hierarchical Architecture Bridging Cognition and Execution for General Robotic Manipulation [90.81956345363355]
RoBridge is a hierarchical intelligent architecture for general robotic manipulation.<n>It consists of a high-level cognitive planner (HCP) based on a large-scale pre-trained vision-language model (VLM)<n>It unleashes the procedural skill of reinforcement learning, effectively bridging the gap between cognition and execution.
arXiv Detail & Related papers (2025-05-03T06:17:18Z) - DexGraspVLA: A Vision-Language-Action Framework Towards General Dexterous Grasping [14.511049253735834]
A general-purpose robot must be capable of grasping diverse objects in arbitrary scenarios.
Our solution is DexGraspVLA, a hierarchical framework that utilizes a pre-trained Vision-Language model as the high-level task planner.
Our method achieves a 90+% success rate under thousands of unseen object, lighting, and background combinations.
arXiv Detail & Related papers (2025-02-28T09:57:20Z) - Problem Space Transformations for Generalisation in Behavioural Cloning [17.91476826271504]
This work characterises widespread properties of robotic manipulation.
We empirically demonstrate that transformations arising from each of these properties allow neural policies trained with behavioural cloning to better generalise to out-of-distribution problem instances.
arXiv Detail & Related papers (2024-11-06T17:05:58Z) - Offline Imitation Learning Through Graph Search and Retrieval [57.57306578140857]
Imitation learning is a powerful machine learning algorithm for a robot to acquire manipulation skills.
We propose GSR, a simple yet effective algorithm that learns from suboptimal demonstrations through Graph Search and Retrieval.
GSR can achieve a 10% to 30% higher success rate and over 30% higher proficiency compared to baselines.
arXiv Detail & Related papers (2024-07-22T06:12:21Z) - Commonsense Reasoning for Legged Robot Adaptation with Vision-Language Models [81.55156507635286]
Legged robots are physically capable of navigating a diverse variety of environments and overcoming a wide range of obstructions.
Current learning methods often struggle with generalization to the long tail of unexpected situations without heavy human supervision.
We propose a system, VLM-Predictive Control (VLM-PC), combining two key components that we find to be crucial for eliciting on-the-fly, adaptive behavior selection.
arXiv Detail & Related papers (2024-07-02T21:00:30Z) - RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis [102.1876259853457]
We propose a tree-structured multimodal code generation framework for generalized robotic behavior synthesis, termed RoboCodeX.
RoboCodeX decomposes high-level human instructions into multiple object-centric manipulation units consisting of physical preferences such as affordance and safety constraints.
To further enhance the capability to map conceptual and perceptual understanding into control commands, a specialized multimodal reasoning dataset is collected for pre-training and an iterative self-updating methodology is introduced for supervised fine-tuning.
arXiv Detail & Related papers (2024-02-25T15:31:43Z) - Instruction-driven history-aware policies for robotic manipulations [82.25511767738224]
We propose a unified transformer-based approach that takes into account multiple inputs.
In particular, our transformer architecture integrates (i) natural language instructions and (ii) multi-view scene observations.
We evaluate our method on the challenging RLBench benchmark and on a real-world robot.
arXiv Detail & Related papers (2022-09-11T16:28:25Z) - Learning Category-Level Generalizable Object Manipulation Policy via
Generative Adversarial Self-Imitation Learning from Demonstrations [14.001076951265558]
Generalizable object manipulation skills are critical for intelligent robots to work in real-world complex scenes.
In this work, we tackle this category-level object manipulation policy learning problem via imitation learning in a task-agnostic manner.
We propose several general but critical techniques, including generative adversarial self-imitation learning from demonstrations, progressive growing of discriminator, and instance-balancing for expert buffer.
arXiv Detail & Related papers (2022-03-04T02:52:02Z) - CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and
Transfer Learning [138.40338621974954]
CausalWorld is a benchmark for causal structure and transfer learning in a robotic manipulation environment.
Tasks consist of constructing 3D shapes from a given set of blocks - inspired by how children learn to build complex structures.
arXiv Detail & Related papers (2020-10-08T23:01:13Z) - SQUIRL: Robust and Efficient Learning from Video Demonstration of
Long-Horizon Robotic Manipulation Tasks [8.756012472587601]
Deep reinforcement learning (RL) can be used to learn complex manipulation tasks.
RL requires the robot to collect a large amount of real-world experience.
S SQUIRL performs a new but related long-horizon task robustly given only a single video demonstration.
arXiv Detail & Related papers (2020-03-10T20:26:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.