Learning Object Placements For Relational Instructions by Hallucinating
Scene Representations
- URL: http://arxiv.org/abs/2001.08481v2
- Date: Fri, 21 Feb 2020 18:14:11 GMT
- Title: Learning Object Placements For Relational Instructions by Hallucinating
Scene Representations
- Authors: Oier Mees, Alp Emek, Johan Vertens, Wolfram Burgard
- Abstract summary: We present a convolutional neural network for estimating pixelwise object placement probabilities for a set of spatial relations from a single input image.
Our method does not require ground truth data for the pixelwise relational probabilities or 3D models of the objects.
Results obtained using real-world data and human-robot experiments demonstrate the effectiveness of our method.
- Score: 26.897316325189205
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Robots coexisting with humans in their environment and performing services
for them need the ability to interact with them. One particular requirement for
such robots is that they are able to understand spatial relations and can place
objects in accordance with the spatial relations expressed by their user. In
this work, we present a convolutional neural network for estimating pixelwise
object placement probabilities for a set of spatial relations from a single
input image. During training, our network receives the learning signal by
classifying hallucinated high-level scene representations as an auxiliary task.
Unlike previous approaches, our method does not require ground truth data for
the pixelwise relational probabilities or 3D models of the objects, which
significantly expands the applicability in practical applications. Our results
obtained using real-world data and human-robot experiments demonstrate the
effectiveness of our method in reasoning about the best way to place objects to
reproduce a spatial relation. Videos of our experiments can be found at
https://youtu.be/zaZkHTWFMKM
Related papers
- RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics [26.42651735582044]
We introduce RoboSpatial, a large-scale spatial understanding dataset consisting of real indoor and tabletop scenes captured as 3D scans and egocentric images annotated with rich spatial information relevant to robotics.
Our experiments show that models trained with RoboSpatial outperform baselines on downstream tasks such as spatial affordance prediction, spatial relationship prediction, and robotics manipulation.
arXiv Detail & Related papers (2024-11-25T16:21:34Z) - Visual-Geometric Collaborative Guidance for Affordance Learning [63.038406948791454]
We propose a visual-geometric collaborative guided affordance learning network that incorporates visual and geometric cues.
Our method outperforms the representative models regarding objective metrics and visual quality.
arXiv Detail & Related papers (2024-10-15T07:35:51Z) - Neural feels with neural fields: Visuo-tactile perception for in-hand
manipulation [57.60490773016364]
We combine vision and touch sensing on a multi-fingered hand to estimate an object's pose and shape during in-hand manipulation.
Our method, NeuralFeels, encodes object geometry by learning a neural field online and jointly tracks it by optimizing a pose graph problem.
Our results demonstrate that touch, at the very least, refines and, at the very best, disambiguates visual estimates during in-hand manipulation.
arXiv Detail & Related papers (2023-12-20T22:36:37Z) - Visual Affordance Prediction for Guiding Robot Exploration [56.17795036091848]
We develop an approach for learning visual affordances for guiding robot exploration.
We use a Transformer-based model to learn a conditional distribution in the latent embedding space of a VQ-VAE.
We show how the trained affordance model can be used for guiding exploration by acting as a goal-sampling distribution, during visual goal-conditioned policy learning in robotic manipulation.
arXiv Detail & Related papers (2023-05-28T17:53:09Z) - Learning Sim-to-Real Dense Object Descriptors for Robotic Manipulation [4.7246285569677315]
We present Sim-to-Real Dense Object Nets (SRDONs), a dense object descriptor that not only understands the object via appropriate representation but also maps simulated and real data to a unified feature space with pixel consistency.
We demonstrate in experiments that pre-trained SRDONs significantly improve performances on unseen objects and unseen visual environments for various robotic tasks with zero real-world training.
arXiv Detail & Related papers (2023-04-18T02:28:55Z) - Learning Reward Functions for Robotic Manipulation by Observing Humans [92.30657414416527]
We use unlabeled videos of humans solving a wide range of manipulation tasks to learn a task-agnostic reward function for robotic manipulation policies.
The learned rewards are based on distances to a goal in an embedding space learned using a time-contrastive objective.
arXiv Detail & Related papers (2022-11-16T16:26:48Z) - Things not Written in Text: Exploring Spatial Commonsense from Visual
Signals [77.46233234061758]
We investigate whether models with visual signals learn more spatial commonsense than text-based models.
We propose a benchmark that focuses on the relative scales of objects, and the positional relationship between people and objects under different actions.
We find that image synthesis models are more capable of learning accurate and consistent spatial knowledge than other models.
arXiv Detail & Related papers (2022-03-15T17:02:30Z) - Synergies Between Affordance and Geometry: 6-DoF Grasp Detection via
Implicit Representations [20.155920256334706]
We show that 3D reconstruction and grasp learning are two intimately connected tasks.
We propose to utilize the synergies between grasp affordance and 3D reconstruction through multi-task learning of a shared representation.
Our method outperforms baselines by over 10% in terms of grasp success rate.
arXiv Detail & Related papers (2021-04-04T05:46:37Z) - Learning Affordance Landscapes for Interaction Exploration in 3D
Environments [101.90004767771897]
Embodied agents must be able to master how their environment works.
We introduce a reinforcement learning approach for exploration for interaction.
We demonstrate our idea with AI2-iTHOR.
arXiv Detail & Related papers (2020-08-21T00:29:36Z) - Trajectory annotation using sequences of spatial perception [0.0]
In the near future, more and more machines will perform tasks in the vicinity of human spaces.
This work builds a foundation to address this task.
We propose an unsupervised learning approach based on a neural autoencoding that learns semantically meaningful continuous encodings of prototypical trajectory data.
arXiv Detail & Related papers (2020-04-11T12:22:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.