The Treachery of Images: Bayesian Scene Keypoints for Deep Policy
Learning in Robotic Manipulation
- URL: http://arxiv.org/abs/2305.04718v3
- Date: Wed, 20 Sep 2023 13:24:51 GMT
- Title: The Treachery of Images: Bayesian Scene Keypoints for Deep Policy
Learning in Robotic Manipulation
- Authors: Jan Ole von Hartz, Eugenio Chisari, Tim Welschehold, Wolfram Burgard,
Joschka Boedecker, Abhinav Valada
- Abstract summary: We present BASK, a Bayesian approach to tracking scale-invariant keypoints over time.
We employ our method to learn challenging multi-object robot manipulation tasks from wrist camera observations.
- Score: 28.30126109684119
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In policy learning for robotic manipulation, sample efficiency is of
paramount importance. Thus, learning and extracting more compact
representations from camera observations is a promising avenue. However,
current methods often assume full observability of the scene and struggle with
scale invariance. In many tasks and settings, this assumption does not hold as
objects in the scene are often occluded or lie outside the field of view of the
camera, rendering the camera observation ambiguous with regard to their
location. To tackle this problem, we present BASK, a Bayesian approach to
tracking scale-invariant keypoints over time. Our approach successfully
resolves inherent ambiguities in images, enabling keypoint tracking on
symmetrical objects and occluded and out-of-view objects. We employ our method
to learn challenging multi-object robot manipulation tasks from wrist camera
observations and demonstrate superior utility for policy learning compared to
other representation learning techniques. Furthermore, we show outstanding
robustness towards disturbances such as clutter, occlusions, and noisy depth
measurements, as well as generalization to unseen objects both in simulation
and real-world robotic experiments.
Related papers
- Retrieval Robust to Object Motion Blur [54.34823913494456]
We propose a method for object retrieval in images that are affected by motion blur.
We present the first large-scale datasets for blurred object retrieval.
Our method outperforms state-of-the-art retrieval methods on the new blur-retrieval datasets.
arXiv Detail & Related papers (2024-04-27T23:22:39Z) - Unsupervised learning based object detection using Contrastive Learning [6.912349403119665]
We introduce a groundbreaking method for training single-stage object detectors through unsupervised/self-supervised learning.
Our state-of-the-art approach has the potential to revolutionize the labeling process, substantially reducing the time and cost associated with manual annotation.
We pioneer the concept of intra-image contrastive learning alongside inter-image counterparts, enabling the acquisition of crucial location information.
arXiv Detail & Related papers (2024-02-21T01:44:15Z) - Learning Extrinsic Dexterity with Parameterized Manipulation Primitives [8.7221770019454]
We learn a sequence of actions that utilize the environment to change the object's pose.
Our approach can control the object's state through exploiting interactions between the object, the gripper, and the environment.
We evaluate our approach on picking box-shaped objects of various weight, shape, and friction properties from a constrained table-top workspace.
arXiv Detail & Related papers (2023-10-26T21:28:23Z) - Visual-Policy Learning through Multi-Camera View to Single-Camera View
Knowledge Distillation for Robot Manipulation Tasks [4.820787231200527]
We present a novel approach to enhance the generalization performance of vision-based Reinforcement Learning (RL) algorithms for robotic manipulation tasks.
Our proposed method involves utilizing a technique known as knowledge distillation, in which a pre-trained teacher'' policy trained with multiple camera viewpoints guides a student'' policy in learning from a single camera viewpoint.
The results demonstrate that the single-view visual student policy can successfully learn to grasp and lift a challenging object, which was not possible with a single-view policy alone.
arXiv Detail & Related papers (2023-03-13T11:42:38Z) - H-SAUR: Hypothesize, Simulate, Act, Update, and Repeat for Understanding
Object Articulations from Interactions [62.510951695174604]
"Hypothesize, Simulate, Act, Update, and Repeat" (H-SAUR) is a probabilistic generative framework that generates hypotheses about how objects articulate given input observations.
We show that the proposed model significantly outperforms the current state-of-the-art articulated object manipulation framework.
We further improve the test-time efficiency of H-SAUR by integrating a learned prior from learning-based vision models.
arXiv Detail & Related papers (2022-10-22T18:39:33Z) - Self-Supervised Learning of Multi-Object Keypoints for Robotic
Manipulation [8.939008609565368]
In this paper, we demonstrate the efficacy of learning image keypoints via the Dense Correspondence pretext task for downstream policy learning.
We evaluate our approach on diverse robot manipulation tasks, compare it to other visual representation learning approaches, and demonstrate its flexibility and effectiveness for sample-efficient policy learning.
arXiv Detail & Related papers (2022-05-17T13:15:07Z) - Visuomotor Control in Multi-Object Scenes Using Object-Aware
Representations [25.33452947179541]
We show the effectiveness of object-aware representation learning techniques for robotic tasks.
Our model learns control policies in a sample-efficient manner and outperforms state-of-the-art object techniques.
arXiv Detail & Related papers (2022-05-12T19:48:11Z) - Object Manipulation via Visual Target Localization [64.05939029132394]
Training agents to manipulate objects, poses many challenges.
We propose an approach that explores the environment in search for target objects, computes their 3D coordinates once they are located, and then continues to estimate their 3D locations even when the objects are not visible.
Our evaluations show a massive 3x improvement in success rate over a model that has access to the same sensory suite.
arXiv Detail & Related papers (2022-03-15T17:59:01Z) - Attentive and Contrastive Learning for Joint Depth and Motion Field
Estimation [76.58256020932312]
Estimating the motion of the camera together with the 3D structure of the scene from a monocular vision system is a complex task.
We present a self-supervised learning framework for 3D object motion field estimation from monocular videos.
arXiv Detail & Related papers (2021-10-13T16:45:01Z) - Object-aware Contrastive Learning for Debiased Scene Representation [74.30741492814327]
We develop a novel object-aware contrastive learning framework that localizes objects in a self-supervised manner.
We also introduce two data augmentations based on ContraCAM, object-aware random crop and background mixup, which reduce contextual and background biases during contrastive self-supervised learning.
arXiv Detail & Related papers (2021-07-30T19:24:07Z) - A Simple and Effective Use of Object-Centric Images for Long-Tailed
Object Detection [56.82077636126353]
We take advantage of object-centric images to improve object detection in scene-centric images.
We present a simple yet surprisingly effective framework to do so.
Our approach can improve the object detection (and instance segmentation) accuracy of rare objects by 50% (and 33%) relatively.
arXiv Detail & Related papers (2021-02-17T17:27:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.