Active Perception and Representation for Robotic Manipulation
- URL: http://arxiv.org/abs/2003.06734v1
- Date: Sun, 15 Mar 2020 01:43:51 GMT
- Title: Active Perception and Representation for Robotic Manipulation
- Authors: Youssef Zaky, Gaurav Paruthi, Bryan Tripp, James Bergstra
- Abstract summary: We present a framework that leverages the benefits of active perception to accomplish manipulation tasks.
Our agent uses viewpoint changes to localize objects, to learn state representations in a self-supervised manner, and to perform goal-directed actions.
Compared to vanilla deep Q-learning algorithms, our model is at least four times more sample-efficient.
- Score: 0.8315801422499861
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The vast majority of visual animals actively control their eyes, heads,
and/or bodies to direct their gaze toward different parts of their environment.
In contrast, recent applications of reinforcement learning in robotic
manipulation employ cameras as passive sensors. These are carefully placed to
view a scene from a fixed pose. Active perception allows animals to gather the
most relevant information about the world and focus their computational
resources where needed. It also enables them to view objects from different
distances and viewpoints, providing a rich visual experience from which to
learn abstract representations of the environment. Inspired by the primate
visual-motor system, we present a framework that leverages the benefits of
active perception to accomplish manipulation tasks. Our agent uses viewpoint
changes to localize objects, to learn state representations in a
self-supervised manner, and to perform goal-directed actions. We apply our
model to a simulated grasping task with a 6-DoF action space. Compared to its
passive, fixed-camera counterpart, the active model achieves 8% better
performance in targeted grasping. Compared to vanilla deep Q-learning
algorithms, our model is at least four times more sample-efficient,
highlighting the benefits of both active perception and representation
learning.
Related papers
- Mind the GAP: Glimpse-based Active Perception improves generalization and sample efficiency of visual reasoning [0.7999703756441756]
Human capabilities in understanding visual relations are far superior to those of AI systems.
We develop a system equipped with a novel Glimpse-based Active Perception (GAP)
The results suggest that the GAP is essential for extracting visual relations that go beyond the immediate visual content.
arXiv Detail & Related papers (2024-09-30T11:48:11Z) - What Makes Pre-Trained Visual Representations Successful for Robust
Manipulation? [57.92924256181857]
We find that visual representations designed for manipulation and control tasks do not necessarily generalize under subtle changes in lighting and scene texture.
We find that emergent segmentation ability is a strong predictor of out-of-distribution generalization among ViT models.
arXiv Detail & Related papers (2023-11-03T18:09:08Z) - Human-oriented Representation Learning for Robotic Manipulation [64.59499047836637]
Humans inherently possess generalizable visual representations that empower them to efficiently explore and interact with the environments in manipulation tasks.
We formalize this idea through the lens of human-oriented multi-task fine-tuning on top of pre-trained visual encoders.
Our Task Fusion Decoder consistently improves the representation of three state-of-the-art visual encoders for downstream manipulation policy-learning.
arXiv Detail & Related papers (2023-10-04T17:59:38Z) - Masked Visual Pre-training for Motor Control [118.18189211080225]
Self-supervised visual pre-training from real-world images is effective for learning motor control tasks from pixels.
We freeze the visual encoder and train neural network controllers on top with reinforcement learning.
This is the first self-supervised model to exploit real-world images at scale for motor control.
arXiv Detail & Related papers (2022-03-11T18:58:10Z) - Learning Perceptual Locomotion on Uneven Terrains using Sparse Visual
Observations [75.60524561611008]
This work aims to exploit the use of sparse visual observations to achieve perceptual locomotion over a range of commonly seen bumps, ramps, and stairs in human-centred environments.
We first formulate the selection of minimal visual input that can represent the uneven surfaces of interest, and propose a learning framework that integrates such exteroceptive and proprioceptive data.
We validate the learned policy in tasks that require omnidirectional walking over flat ground and forward locomotion over terrains with obstacles, showing a high success rate.
arXiv Detail & Related papers (2021-09-28T20:25:10Z) - Imitation Learning with Human Eye Gaze via Multi-Objective Prediction [3.5779268406205618]
We propose Gaze Regularized Imitation Learning (GRIL), a novel context-aware imitation learning architecture.
GRIL learns concurrently from both human demonstrations and eye gaze to solve tasks where visual attention provides important context.
We show that GRIL outperforms several state-of-the-art gaze-based imitation learning algorithms, simultaneously learns to predict human visual attention, and generalizes to scenarios not present in the training data.
arXiv Detail & Related papers (2021-02-25T17:13:13Z) - What Can You Learn from Your Muscles? Learning Visual Representation
from Human Interactions [50.435861435121915]
We use human interaction and attention cues to investigate whether we can learn better representations compared to visual-only representations.
Our experiments show that our "muscly-supervised" representation outperforms a visual-only state-of-the-art method MoCo.
arXiv Detail & Related papers (2020-10-16T17:46:53Z) - VisualEchoes: Spatial Image Representation Learning through Echolocation [97.23789910400387]
Several animal species (e.g., bats, dolphins, and whales) and even visually impaired humans have the remarkable ability to perform echolocation.
We propose a novel interaction-based representation learning framework that learns useful visual features via echolocation.
Our work opens a new path for representation learning for embodied agents, where supervision comes from interacting with the physical world.
arXiv Detail & Related papers (2020-05-04T16:16:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.