Orientation Attentive Robotic Grasp Synthesis with Augmented Grasp Map
Representation
- URL: http://arxiv.org/abs/2006.05123v2
- Date: Tue, 2 Feb 2021 10:26:55 GMT
- Title: Orientation Attentive Robotic Grasp Synthesis with Augmented Grasp Map
Representation
- Authors: Georgia Chalvatzaki, Nikolaos Gkanatsios, Petros Maragos, Jan Peters
- Abstract summary: morphological characteristics in objects may offer a wide range of plausible grasping orientations that obfuscates the visual learning of robotic grasping.
Existing grasp generation approaches are cursed to construct discontinuous grasp maps by aggregating annotations for drastically different orientations per grasping point.
We propose a novel augmented grasp map representation, suitable for pixel-wise synthesis, that locally disentangles grasping orientations by partitioning the angle space into multiple bins.
- Score: 62.79160608266713
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Inherent morphological characteristics in objects may offer a wide range of
plausible grasping orientations that obfuscates the visual learning of robotic
grasping. Existing grasp generation approaches are cursed to construct
discontinuous grasp maps by aggregating annotations for drastically different
orientations per grasping point. Moreover, current methods generate grasp
candidates across a single direction in the robot's viewpoint, ignoring its
feasibility constraints. In this paper, we propose a novel augmented grasp map
representation, suitable for pixel-wise synthesis, that locally disentangles
grasping orientations by partitioning the angle space into multiple bins.
Furthermore, we introduce the ORientation AtteNtive Grasp synthEsis (ORANGE)
framework, that jointly addresses classification into orientation bins and
angle-value regression. The bin-wise orientation maps further serve as an
attention mechanism for areas with higher graspability, i.e. probability of
being an actual grasp point. We report new state-of-the-art 94.71% performance
on Jacquard, with a simple U-Net using only depth images, outperforming even
multi-modal approaches. Subsequent qualitative results with a real bi-manual
robot validate ORANGE's effectiveness in generating grasps for multiple
orientations, hence allowing planning grasps that are feasible.
Related papers
- Scalable Self-Supervised Representation Learning from Spatiotemporal
Motion Trajectories for Multimodal Computer Vision [0.0]
We propose a self-supervised, unlabeled method for learning representations of geographic locations from GPS trajectories.
We show that reachability embeddings are semantically meaningful representations and result in 4-23% gain in performance as measured using area under precision-recall curve (AUPRC) metric.
arXiv Detail & Related papers (2022-10-07T02:41:02Z) - Deep Orientation-Aware Functional Maps: Tackling Symmetry Issues in
Shape Matching [32.03608983026839]
We propose a new deep learning approach to learn orientation-aware features in a fully unsupervised setting.
Our architecture is built on top of DiffusionNet, making it robust to discretization changes.
arXiv Detail & Related papers (2022-04-28T12:36:09Z) - Reachability Embeddings: Scalable Self-Supervised Representation
Learning from Markovian Trajectories for Geospatial Computer Vision [0.0]
We propose a self-supervised method for learning representations of geographic locations from unlabeled GPS trajectories.
A scalable and distributed algorithm is presented to compute image-like representations, called reachability summaries.
We show that reachability embeddings are semantically meaningful representations and result in 4-23% gain in performance.
arXiv Detail & Related papers (2021-10-24T20:10:22Z) - TSG: Target-Selective Gradient Backprop for Probing CNN Visual Saliency [72.9106103283475]
We study the visual saliency, a.k.a. visual explanation, to interpret convolutional neural networks.
Inspired by those observations, we propose a novel visual saliency framework, termed Target-Selective Gradient (TSG) backprop.
The proposed TSG consists of two components, namely, TSG-Conv and TSG-FC, which rectify the gradients for convolutional layers and fully-connected layers, respectively.
arXiv Detail & Related papers (2021-10-11T12:00:20Z) - Fork or Fail: Cycle-Consistent Training with Many-to-One Mappings [67.11712279612583]
Cycle-consistent training is widely used for learning a forward and inverse mapping between two domains of interest.
We develop a conditional variational autoencoder (CVAE) approach that can be viewed as converting surjective mappings to implicit bijections.
Our pipeline can capture such many-to-one mappings during cycle training while promoting graph-to-text diversity.
arXiv Detail & Related papers (2020-12-14T10:59:59Z) - Unsupervised Discovery of Disentangled Manifolds in GANs [74.24771216154105]
Interpretable generation process is beneficial to various image editing applications.
We propose a framework to discover interpretable directions in the latent space given arbitrary pre-trained generative adversarial networks.
arXiv Detail & Related papers (2020-11-24T02:18:08Z) - Gravitational Models Explain Shifts on Human Visual Attention [80.76475913429357]
Visual attention refers to the human brain's ability to select relevant sensory information for preferential processing.
Various methods to estimate saliency have been proposed in the last three decades.
We propose a gravitational model (GRAV) to describe the attentional shifts.
arXiv Detail & Related papers (2020-09-15T10:12:41Z) - Gaussian Process Gradient Maps for Loop-Closure Detection in
Unstructured Planetary Environments [17.276441789710574]
The ability to recognize previously mapped locations is an essential feature for autonomous systems.
Unstructured planetary-like environments pose a major challenge to these systems due to the similarity of the terrain.
This paper presents a method to solve the loop closure problem using only spatial information.
arXiv Detail & Related papers (2020-09-01T04:41:40Z) - Object-and-Action Aware Model for Visual Language Navigation [70.33142095637515]
Vision-and-Language Navigation (VLN) is unique in that it requires turning relatively general natural-language instructions into robot agent actions.
We propose an Object-and-Action Aware Model (OAAM) that processes these two different forms of natural language based instruction separately.
This enables each process to match object-centered/action-centered instruction to their own counterpart visual perception/action orientation flexibly.
arXiv Detail & Related papers (2020-07-29T06:32:18Z) - Improving Movement Predictions of Traffic Actors in Bird's-Eye View
Models using GANs and Differentiable Trajectory Rasterization [12.652210024012374]
One of the most critical pieces of the self-driving puzzle is the task of predicting future movement of surrounding traffic actors.
Methods based on top-down sceneization on one side and Generative Adrial Networks (GANs) on the other have shown to be particularly successful.
In this paper we build upon these two directions and propose aversa-based conditional GAN architecture.
We evaluate the proposed method on a large-scale, real-world data set, showing that it outperforms state-of-the-art GAN-based baselines.
arXiv Detail & Related papers (2020-04-14T00:41:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.