Related papers: Learning Object Placements For Relational Instructions by Hallucinating Scene Representations

Learning Object Placements For Relational Instructions by Hallucinating Scene Representations

URL: http://arxiv.org/abs/2001.08481v2
Date: Fri, 21 Feb 2020 18:14:11 GMT
Title: Learning Object Placements For Relational Instructions by Hallucinating Scene Representations
Authors: Oier Mees, Alp Emek, Johan Vertens, Wolfram Burgard
Abstract summary: We present a convolutional neural network for estimating pixelwise object placement probabilities for a set of spatial relations from a single input image. Our method does not require ground truth data for the pixelwise relational probabilities or 3D models of the objects. Results obtained using real-world data and human-robot experiments demonstrate the effectiveness of our method.
Score: 26.897316325189205
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Robots coexisting with humans in their environment and performing services for them need the ability to interact with them. One particular requirement for such robots is that they are able to understand spatial relations and can place objects in accordance with the spatial relations expressed by their user. In this work, we present a convolutional neural network for estimating pixelwise object placement probabilities for a set of spatial relations from a single input image. During training, our network receives the learning signal by classifying hallucinated high-level scene representations as an auxiliary task. Unlike previous approaches, our method does not require ground truth data for the pixelwise relational probabilities or 3D models of the objects, which significantly expands the applicability in practical applications. Our results obtained using real-world data and human-robot experiments demonstrate the effectiveness of our method in reasoning about the best way to place objects to reproduce a spatial relation. Videos of our experiments can be found at https://youtu.be/zaZkHTWFMKM

Related papers

RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics [26.42651735582044]
We introduce RoboSpatial, a large-scale spatial understanding dataset consisting of real indoor and tabletop scenes captured as 3D scans and egocentric images annotated with rich spatial information relevant to robotics. Our experiments show that models trained with RoboSpatial outperform baselines on downstream tasks such as spatial affordance prediction, spatial relationship prediction, and robotics manipulation.
arXiv Detail & Related papers (2024-11-25T16:21:34Z)
Visual-Geometric Collaborative Guidance for Affordance Learning [63.038406948791454]
We propose a visual-geometric collaborative guided affordance learning network that incorporates visual and geometric cues. Our method outperforms the representative models regarding objective metrics and visual quality.
arXiv Detail & Related papers (2024-10-15T07:35:51Z)
Neural feels with neural fields: Visuo-tactile perception for in-hand manipulation [57.60490773016364]
We combine vision and touch sensing on a multi-fingered hand to estimate an object's pose and shape during in-hand manipulation. Our method, NeuralFeels, encodes object geometry by learning a neural field online and jointly tracks it by optimizing a pose graph problem. Our results demonstrate that touch, at the very least, refines and, at the very best, disambiguates visual estimates during in-hand manipulation.
arXiv Detail & Related papers (2023-12-20T22:36:37Z)
Teaching Unknown Objects by Leveraging Human Gaze and Augmented Reality in Human-Robot Interaction [3.1473798197405953]
This dissertation aims to teach a robot unknown objects in the context of Human-Robot Interaction (HRI) The combination of eye tracking and Augmented Reality created a powerful synergy that empowered the human teacher to communicate with the robot. The robot's object detection capabilities exhibited comparable performance to state-of-the-art object detectors trained on extensive datasets.
arXiv Detail & Related papers (2023-12-12T11:34:43Z)
Visual Affordance Prediction for Guiding Robot Exploration [56.17795036091848]
We develop an approach for learning visual affordances for guiding robot exploration. We use a Transformer-based model to learn a conditional distribution in the latent embedding space of a VQ-VAE. We show how the trained affordance model can be used for guiding exploration by acting as a goal-sampling distribution, during visual goal-conditioned policy learning in robotic manipulation.
arXiv Detail & Related papers (2023-05-28T17:53:09Z)
Learning Sim-to-Real Dense Object Descriptors for Robotic Manipulation [4.7246285569677315]
We present Sim-to-Real Dense Object Nets (SRDONs), a dense object descriptor that not only understands the object via appropriate representation but also maps simulated and real data to a unified feature space with pixel consistency. We demonstrate in experiments that pre-trained SRDONs significantly improve performances on unseen objects and unseen visual environments for various robotic tasks with zero real-world training.
arXiv Detail & Related papers (2023-04-18T02:28:55Z)
Learning Reward Functions for Robotic Manipulation by Observing Humans [92.30657414416527]
We use unlabeled videos of humans solving a wide range of manipulation tasks to learn a task-agnostic reward function for robotic manipulation policies. The learned rewards are based on distances to a goal in an embedding space learned using a time-contrastive objective.
arXiv Detail & Related papers (2022-11-16T16:26:48Z)
Things not Written in Text: Exploring Spatial Commonsense from Visual Signals [77.46233234061758]
We investigate whether models with visual signals learn more spatial commonsense than text-based models. We propose a benchmark that focuses on the relative scales of objects, and the positional relationship between people and objects under different actions. We find that image synthesis models are more capable of learning accurate and consistent spatial knowledge than other models.
arXiv Detail & Related papers (2022-03-15T17:02:30Z)
Synergies Between Affordance and Geometry: 6-DoF Grasp Detection via Implicit Representations [20.155920256334706]
We show that 3D reconstruction and grasp learning are two intimately connected tasks. We propose to utilize the synergies between grasp affordance and 3D reconstruction through multi-task learning of a shared representation. Our method outperforms baselines by over 10% in terms of grasp success rate.
arXiv Detail & Related papers (2021-04-04T05:46:37Z)
Learning Affordance Landscapes for Interaction Exploration in 3D Environments [101.90004767771897]
Embodied agents must be able to master how their environment works. We introduce a reinforcement learning approach for exploration for interaction. We demonstrate our idea with AI2-iTHOR.
arXiv Detail & Related papers (2020-08-21T00:29:36Z)
Trajectory annotation using sequences of spatial perception [0.0]
In the near future, more and more machines will perform tasks in the vicinity of human spaces. This work builds a foundation to address this task. We propose an unsupervised learning approach based on a neural autoencoding that learns semantically meaningful continuous encodings of prototypical trajectory data.
arXiv Detail & Related papers (2020-04-11T12:22:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.