Visuomotor Control in Multi-Object Scenes Using Object-Aware
Representations
- URL: http://arxiv.org/abs/2205.06333v1
- Date: Thu, 12 May 2022 19:48:11 GMT
- Title: Visuomotor Control in Multi-Object Scenes Using Object-Aware
Representations
- Authors: Negin Heravi, Ayzaan Wahid, Corey Lynch, Pete Florence, Travis
Armstrong, Jonathan Tompson, Pierre Sermanet, Jeannette Bohg, Debidatta
Dwibedi
- Abstract summary: We show the effectiveness of object-aware representation learning techniques for robotic tasks.
Our model learns control policies in a sample-efficient manner and outperforms state-of-the-art object techniques.
- Score: 25.33452947179541
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Perceptual understanding of the scene and the relationship between its
different components is important for successful completion of robotic tasks.
Representation learning has been shown to be a powerful technique for this, but
most of the current methodologies learn task specific representations that do
not necessarily transfer well to other tasks. Furthermore, representations
learned by supervised methods require large labeled datasets for each task that
are expensive to collect in the real world. Using self-supervised learning to
obtain representations from unlabeled data can mitigate this problem. However,
current self-supervised representation learning methods are mostly object
agnostic, and we demonstrate that the resulting representations are
insufficient for general purpose robotics tasks as they fail to capture the
complexity of scenes with many components. In this paper, we explore the
effectiveness of using object-aware representation learning techniques for
robotic tasks. Our self-supervised representations are learned by observing the
agent freely interacting with different parts of the environment and is queried
in two different settings: (i) policy learning and (ii) object location
prediction. We show that our model learns control policies in a
sample-efficient manner and outperforms state-of-the-art object agnostic
techniques as well as methods trained on raw RGB images. Our results show a 20
percent increase in performance in low data regimes (1000 trajectories) in
policy training using implicit behavioral cloning (IBC). Furthermore, our
method outperforms the baselines for the task of object localization in
multi-object scenes.
Related papers
- Zero-Shot Object-Centric Representation Learning [72.43369950684057]
We study current object-centric methods through the lens of zero-shot generalization.
We introduce a benchmark comprising eight different synthetic and real-world datasets.
We find that training on diverse real-world images improves transferability to unseen scenarios.
arXiv Detail & Related papers (2024-08-17T10:37:07Z) - What Makes Pre-Trained Visual Representations Successful for Robust
Manipulation? [57.92924256181857]
We find that visual representations designed for manipulation and control tasks do not necessarily generalize under subtle changes in lighting and scene texture.
We find that emergent segmentation ability is a strong predictor of out-of-distribution generalization among ViT models.
arXiv Detail & Related papers (2023-11-03T18:09:08Z) - Localizing Active Objects from Egocentric Vision with Symbolic World
Knowledge [62.981429762309226]
The ability to actively ground task instructions from an egocentric view is crucial for AI agents to accomplish tasks or assist humans virtually.
We propose to improve phrase grounding models' ability on localizing the active objects by: learning the role of objects undergoing change and extracting them accurately from the instructions.
We evaluate our framework on Ego4D and Epic-Kitchens datasets.
arXiv Detail & Related papers (2023-10-23T16:14:05Z) - Learning Sim-to-Real Dense Object Descriptors for Robotic Manipulation [4.7246285569677315]
We present Sim-to-Real Dense Object Nets (SRDONs), a dense object descriptor that not only understands the object via appropriate representation but also maps simulated and real data to a unified feature space with pixel consistency.
We demonstrate in experiments that pre-trained SRDONs significantly improve performances on unseen objects and unseen visual environments for various robotic tasks with zero real-world training.
arXiv Detail & Related papers (2023-04-18T02:28:55Z) - Self-Supervised Learning of Multi-Object Keypoints for Robotic
Manipulation [8.939008609565368]
In this paper, we demonstrate the efficacy of learning image keypoints via the Dense Correspondence pretext task for downstream policy learning.
We evaluate our approach on diverse robot manipulation tasks, compare it to other visual representation learning approaches, and demonstrate its flexibility and effectiveness for sample-efficient policy learning.
arXiv Detail & Related papers (2022-05-17T13:15:07Z) - Lifelong Ensemble Learning based on Multiple Representations for
Few-Shot Object Recognition [6.282068591820947]
We present a lifelong ensemble learning approach based on multiple representations to address the few-shot object recognition problem.
To facilitate lifelong learning, each approach is equipped with a memory unit for storing and retrieving object information instantly.
We have performed extensive sets of experiments to assess the performance of the proposed approach in offline, and open-ended scenarios.
arXiv Detail & Related papers (2022-05-04T10:29:10Z) - Task-Induced Representation Learning [14.095897879222672]
We evaluate the effectiveness of representation learning approaches for decision making in visually complex environments.
We find that representation learning generally improves sample efficiency on unseen tasks even in visually complex scenes.
arXiv Detail & Related papers (2022-04-25T17:57:10Z) - Learning Sensorimotor Primitives of Sequential Manipulation Tasks from
Visual Demonstrations [13.864448233719598]
This paper describes a new neural network-based framework for learning simultaneously low-level policies and high-level policies.
A key feature of the proposed approach is that the policies are learned directly from raw videos of task demonstrations.
Empirical results on object manipulation tasks with a robotic arm show that the proposed network can efficiently learn from real visual demonstrations to perform the tasks.
arXiv Detail & Related papers (2022-03-08T01:36:48Z) - Factors of Influence for Transfer Learning across Diverse Appearance
Domains and Task Types [50.1843146606122]
A simple form of transfer learning is common in current state-of-the-art computer vision models.
Previous systematic studies of transfer learning have been limited and the circumstances in which it is expected to work are not fully understood.
In this paper we carry out an extensive experimental exploration of transfer learning across vastly different image domains.
arXiv Detail & Related papers (2021-03-24T16:24:20Z) - Reinforcement Learning with Prototypical Representations [114.35801511501639]
Proto-RL is a self-supervised framework that ties representation learning with exploration through prototypical representations.
These prototypes simultaneously serve as a summarization of the exploratory experience of an agent as well as a basis for representing observations.
This enables state-of-the-art downstream policy learning on a set of difficult continuous control tasks.
arXiv Detail & Related papers (2021-02-22T18:56:34Z) - Visual Imitation Made Easy [102.36509665008732]
We present an alternate interface for imitation that simplifies the data collection process while allowing for easy transfer to robots.
We use commercially available reacher-grabber assistive tools both as a data collection device and as the robot's end-effector.
We experimentally evaluate on two challenging tasks: non-prehensile pushing and prehensile stacking, with 1000 diverse demonstrations for each task.
arXiv Detail & Related papers (2020-08-11T17:58:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.