Visual-Policy Learning through Multi-Camera View to Single-Camera View
Knowledge Distillation for Robot Manipulation Tasks
- URL: http://arxiv.org/abs/2303.07026v2
- Date: Sat, 2 Dec 2023 06:34:41 GMT
- Title: Visual-Policy Learning through Multi-Camera View to Single-Camera View
Knowledge Distillation for Robot Manipulation Tasks
- Authors: Cihan Acar, Kuluhan Binici, Alp Tekirda\u{g} and Yan Wu
- Abstract summary: We present a novel approach to enhance the generalization performance of vision-based Reinforcement Learning (RL) algorithms for robotic manipulation tasks.
Our proposed method involves utilizing a technique known as knowledge distillation, in which a pre-trained teacher'' policy trained with multiple camera viewpoints guides a student'' policy in learning from a single camera viewpoint.
The results demonstrate that the single-view visual student policy can successfully learn to grasp and lift a challenging object, which was not possible with a single-view policy alone.
- Score: 4.820787231200527
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The use of multi-camera views simultaneously has been shown to improve the
generalization capabilities and performance of visual policies. However, the
hardware cost and design constraints in real-world scenarios can potentially
make it challenging to use multiple cameras. In this study, we present a novel
approach to enhance the generalization performance of vision-based
Reinforcement Learning (RL) algorithms for robotic manipulation tasks. Our
proposed method involves utilizing a technique known as knowledge distillation,
in which a pre-trained ``teacher'' policy trained with multiple camera
viewpoints guides a ``student'' policy in learning from a single camera
viewpoint. To enhance the student policy's robustness against camera location
perturbations, it is trained using data augmentation and extreme viewpoint
changes. As a result, the student policy learns robust visual features that
allow it to locate the object of interest accurately and consistently,
regardless of the camera viewpoint. The efficacy and efficiency of the proposed
method were evaluated both in simulation and real-world environments. The
results demonstrate that the single-view visual student policy can successfully
learn to grasp and lift a challenging object, which was not possible with a
single-view policy alone. Furthermore, the student policy demonstrates
zero-shot transfer capability, where it can successfully grasp and lift objects
in real-world scenarios for unseen visual configurations.
Related papers
- View-Invariant Policy Learning via Zero-Shot Novel View Synthesis [26.231630397802785]
We investigate how knowledge from large-scale visual data of the world may be used to address one axis of variation for generalizable manipulation: observational viewpoint.
We study single-image novel view synthesis models, which learn 3D-aware scene-level priors by rendering images of the same scene from alternate camera viewpoints.
For practical application to diverse robotic data, these models must operate zero-shot, performing view synthesis on unseen tasks and environments.
arXiv Detail & Related papers (2024-09-05T16:39:21Z) - Dreamitate: Real-World Visuomotor Policy Learning via Video Generation [49.03287909942888]
We propose a visuomotor policy learning framework that fine-tunes a video diffusion model on human demonstrations of a given task.
We generate an example of an execution of the task conditioned on images of a novel scene, and use this synthesized execution directly to control the robot.
arXiv Detail & Related papers (2024-06-24T17:59:45Z) - ViViDex: Learning Vision-based Dexterous Manipulation from Human Videos [81.99559944822752]
We propose ViViDex to improve vision-based policy learning from human videos.
It first uses reinforcement learning with trajectory guided rewards to train state-based policies for each video.
We then rollout successful episodes from state-based policies and train a unified visual policy without using any privileged information.
arXiv Detail & Related papers (2024-04-24T07:58:28Z) - The Power of the Senses: Generalizable Manipulation from Vision and
Touch through Masked Multimodal Learning [60.91637862768949]
We propose Masked Multimodal Learning (M3L) to fuse visual and tactile information in a reinforcement learning setting.
M3L learns a policy and visual-tactile representations based on masked autoencoding.
We evaluate M3L on three simulated environments with both visual and tactile observations.
arXiv Detail & Related papers (2023-11-02T01:33:00Z) - Learning Generalizable Manipulation Policies with Object-Centric 3D
Representations [65.55352131167213]
GROOT is an imitation learning method for learning robust policies with object-centric and 3D priors.
It builds policies that generalize beyond their initial training conditions for vision-based manipulation.
GROOT's performance excels in generalization over background changes, camera viewpoint shifts, and the presence of new object instances.
arXiv Detail & Related papers (2023-10-22T18:51:45Z) - Learning to Act from Actionless Videos through Dense Correspondences [87.1243107115642]
We present an approach to construct a video-based robot policy capable of reliably executing diverse tasks across different robots and environments.
Our method leverages images as a task-agnostic representation, encoding both the state and action information, and text as a general representation for specifying robot goals.
We demonstrate the efficacy of our approach in learning policies on table-top manipulation and navigation tasks.
arXiv Detail & Related papers (2023-10-12T17:59:23Z) - Contrastive Learning for Enhancing Robust Scene Transfer in Vision-based
Agile Flight [21.728935597793473]
This work proposes an adaptive multi-pair contrastive learning strategy for visual representation learning that enables zero-shot scene transfer and real-world deployment.
We demonstrate the performance of our approach on the task of agile, vision-based quadrotor flight.
arXiv Detail & Related papers (2023-09-18T15:25:59Z) - The Treachery of Images: Bayesian Scene Keypoints for Deep Policy
Learning in Robotic Manipulation [28.30126109684119]
We present BASK, a Bayesian approach to tracking scale-invariant keypoints over time.
We employ our method to learn challenging multi-object robot manipulation tasks from wrist camera observations.
arXiv Detail & Related papers (2023-05-08T14:05:38Z) - Multi-View Masked World Models for Visual Robotic Manipulation [132.97980128530017]
We train a multi-view masked autoencoder which reconstructs pixels of randomly masked viewpoints.
We demonstrate the effectiveness of our method in a range of scenarios.
We also show that the multi-view masked autoencoder trained with multiple randomized viewpoints enables training a policy with strong viewpoint randomization.
arXiv Detail & Related papers (2023-02-05T15:37:02Z) - Self-Supervised Learning of Multi-Object Keypoints for Robotic
Manipulation [8.939008609565368]
In this paper, we demonstrate the efficacy of learning image keypoints via the Dense Correspondence pretext task for downstream policy learning.
We evaluate our approach on diverse robot manipulation tasks, compare it to other visual representation learning approaches, and demonstrate its flexibility and effectiveness for sample-efficient policy learning.
arXiv Detail & Related papers (2022-05-17T13:15:07Z) - Seeing All the Angles: Learning Multiview Manipulation Policies for
Contact-Rich Tasks from Demonstrations [7.51557557629519]
A successful multiview policy could be deployed on a mobile manipulation platform.
We demonstrate that a multiview policy can be found through imitation learning by collecting data from a variety of viewpoints.
We show that learning from multiview data has little, if any, penalty to performance for a fixed-view task compared to learning with an equivalent amount of fixed-view data.
arXiv Detail & Related papers (2021-04-28T17:43:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.