Acceleration of Actor-Critic Deep Reinforcement Learning for Visual
Grasping in Clutter by State Representation Learning Based on Disentanglement
of a Raw Input Image
- URL: http://arxiv.org/abs/2002.11903v1
- Date: Thu, 27 Feb 2020 03:58:51 GMT
- Title: Acceleration of Actor-Critic Deep Reinforcement Learning for Visual
Grasping in Clutter by State Representation Learning Based on Disentanglement
of a Raw Input Image
- Authors: Taewon Kim, Yeseong Park, Youngbin Park and Il Hong Suh
- Abstract summary: Actor-critic deep reinforcement learning (RL) methods typically perform very poorly when grasping diverse objects.
We employ state representation learning (SRL), where we encode essential information first for subsequent use in RL.
We found that preprocessing based on the disentanglement of a raw input image is the key to effectively capturing a compact representation.
- Score: 4.970364068620608
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: For a robotic grasping task in which diverse unseen target objects exist in a
cluttered environment, some deep learning-based methods have achieved
state-of-the-art results using visual input directly. In contrast, actor-critic
deep reinforcement learning (RL) methods typically perform very poorly when
grasping diverse objects, especially when learning from raw images and sparse
rewards. To make these RL techniques feasible for vision-based grasping tasks,
we employ state representation learning (SRL), where we encode essential
information first for subsequent use in RL. However, typical representation
learning procedures are unsuitable for extracting pertinent information for
learning the grasping skill, because the visual inputs for representation
learning, where a robot attempts to grasp a target object in clutter, are
extremely complex. We found that preprocessing based on the disentanglement of
a raw input image is the key to effectively capturing a compact representation.
This enables deep RL to learn robotic grasping skills from highly varied and
diverse visual inputs. We demonstrate the effectiveness of this approach with
varying levels of disentanglement in a realistic simulated environment.
Related papers
- ViSaRL: Visual Reinforcement Learning Guided by Human Saliency [6.969098096933547]
We introduce Visual Saliency-Guided Reinforcement Learning (ViSaRL)
Using ViSaRL to learn visual representations significantly improves the success rate, sample efficiency, and generalization of an RL agent.
We show that visual representations learned using ViSaRL are robust to various sources of visual perturbations including perceptual noise and scene variations.
arXiv Detail & Related papers (2024-03-16T14:52:26Z) - Human-oriented Representation Learning for Robotic Manipulation [64.59499047836637]
Humans inherently possess generalizable visual representations that empower them to efficiently explore and interact with the environments in manipulation tasks.
We formalize this idea through the lens of human-oriented multi-task fine-tuning on top of pre-trained visual encoders.
Our Task Fusion Decoder consistently improves the representation of three state-of-the-art visual encoders for downstream manipulation policy-learning.
arXiv Detail & Related papers (2023-10-04T17:59:38Z) - MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments [72.6405488990753]
Self-supervised learning can be used for mitigating the greedy needs of Vision Transformer networks.
We propose a single-stage and standalone method, MOCA, which unifies both desired properties.
We achieve new state-of-the-art results on low-shot settings and strong experimental results in various evaluation protocols.
arXiv Detail & Related papers (2023-07-18T15:46:20Z) - Self-Supervised Learning of Multi-Object Keypoints for Robotic
Manipulation [8.939008609565368]
In this paper, we demonstrate the efficacy of learning image keypoints via the Dense Correspondence pretext task for downstream policy learning.
We evaluate our approach on diverse robot manipulation tasks, compare it to other visual representation learning approaches, and demonstrate its flexibility and effectiveness for sample-efficient policy learning.
arXiv Detail & Related papers (2022-05-17T13:15:07Z) - Visuomotor Control in Multi-Object Scenes Using Object-Aware
Representations [25.33452947179541]
We show the effectiveness of object-aware representation learning techniques for robotic tasks.
Our model learns control policies in a sample-efficient manner and outperforms state-of-the-art object techniques.
arXiv Detail & Related papers (2022-05-12T19:48:11Z) - Task-Induced Representation Learning [14.095897879222672]
We evaluate the effectiveness of representation learning approaches for decision making in visually complex environments.
We find that representation learning generally improves sample efficiency on unseen tasks even in visually complex scenes.
arXiv Detail & Related papers (2022-04-25T17:57:10Z) - Curious Representation Learning for Embodied Intelligence [81.21764276106924]
Self-supervised representation learning has achieved remarkable success in recent years.
Yet to build truly intelligent agents, we must construct representation learning algorithms that can learn from environments.
We propose a framework, curious representation learning, which jointly learns a reinforcement learning policy and a visual representation model.
arXiv Detail & Related papers (2021-05-03T17:59:20Z) - Few-Cost Salient Object Detection with Adversarial-Paced Learning [95.0220555274653]
This paper proposes to learn the effective salient object detection model based on the manual annotation on a few training images only.
We name this task as the few-cost salient object detection and propose an adversarial-paced learning (APL)-based framework to facilitate the few-cost learning scenario.
arXiv Detail & Related papers (2021-04-05T14:15:49Z) - Heterogeneous Contrastive Learning: Encoding Spatial Information for
Compact Visual Representations [183.03278932562438]
This paper presents an effective approach that adds spatial information to the encoding stage to alleviate the learning inconsistency between the contrastive objective and strong data augmentation operations.
We show that our approach achieves higher efficiency in visual representations and thus delivers a key message to inspire the future research of self-supervised visual representation learning.
arXiv Detail & Related papers (2020-11-19T16:26:25Z) - Distilling Localization for Self-Supervised Representation Learning [82.79808902674282]
Contrastive learning has revolutionized unsupervised representation learning.
Current contrastive models are ineffective at localizing the foreground object.
We propose a data-driven approach for learning in variance to backgrounds.
arXiv Detail & Related papers (2020-04-14T16:29:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.