Spatial Reasoning from Natural Language Instructions for Robot
Manipulation
- URL: http://arxiv.org/abs/2012.13693v2
- Date: Fri, 26 Mar 2021 15:24:57 GMT
- Title: Spatial Reasoning from Natural Language Instructions for Robot
Manipulation
- Authors: Sagar Gubbi Venkatesh and Anirban Biswas and Raviteja Upadrashta and
Vikram Srinivasan and Partha Talukdar and Bharadwaj Amrutur
- Abstract summary: We propose a pipelined architecture of two stages to perform spatial reasoning on the text input.
All the objects in the scene are first localized, and then the instruction for the robot in natural language and the localized co-ordinates are mapped to the start and end co-ordinates.
The proposed method is used to pick-and-place playing cards using a robot arm.
- Score: 0.5033155053523041
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Robots that can manipulate objects in unstructured environments and
collaborate with humans can benefit immensely by understanding natural
language. We propose a pipelined architecture of two stages to perform spatial
reasoning on the text input. All the objects in the scene are first localized,
and then the instruction for the robot in natural language and the localized
co-ordinates are mapped to the start and end co-ordinates corresponding to the
locations where the robot must pick up and place the object respectively. We
show that representing the localized objects by quantizing their positions to a
binary grid is preferable to representing them as a list of 2D co-ordinates. We
also show that attention improves generalization and can overcome biases in the
dataset. The proposed method is used to pick-and-place playing cards using a
robot arm.
Related papers
- Navigation with Large Language Models: Semantic Guesswork as a Heuristic
for Planning [73.0990339667978]
Navigation in unfamiliar environments presents a major challenge for robots.
We use language models to bias exploration of novel real-world environments.
We evaluate LFG in challenging real-world environments and simulated benchmarks.
arXiv Detail & Related papers (2023-10-16T06:21:06Z) - RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic
Control [140.48218261864153]
We study how vision-language models trained on Internet-scale data can be incorporated directly into end-to-end robotic control.
Our approach leads to performant robotic policies and enables RT-2 to obtain a range of emergent capabilities from Internet-scale training.
arXiv Detail & Related papers (2023-07-28T21:18:02Z) - Open-World Object Manipulation using Pre-trained Vision-Language Models [72.87306011500084]
For robots to follow instructions from people, they must be able to connect the rich semantic information in human vocabulary.
We develop a simple approach, which leverages a pre-trained vision-language model to extract object-identifying information.
In a variety of experiments on a real mobile manipulator, we find that MOO generalizes zero-shot to a wide range of novel object categories and environments.
arXiv Detail & Related papers (2023-03-02T01:55:10Z) - Enhancing Interpretability and Interactivity in Robot Manipulation: A
Neurosymbolic Approach [0.0]
We present a neurosymbolic architecture for coupling language-guided visual reasoning with robot manipulation.
A non-expert human user can prompt the robot using unconstrained natural language, providing a referring expression (REF), a question (VQA) or a grasp action instruction.
We generate a 3D vision-and-language synthetic dataset of tabletop scenes in a simulation environment to train our approach and perform extensive evaluations in both synthetic and real-world scenes.
arXiv Detail & Related papers (2022-10-03T12:21:45Z) - Extracting Zero-shot Common Sense from Large Language Models for Robot
3D Scene Understanding [25.270772036342688]
We introduce a novel method for leveraging common sense embedded within large language models for labelling rooms.
The proposed algorithm operates on 3D scene graphs produced by modern spatial perception systems.
arXiv Detail & Related papers (2022-06-09T16:05:35Z) - Correcting Robot Plans with Natural Language Feedback [88.92824527743105]
We explore natural language as an expressive and flexible tool for robot correction.
We show that these transformations enable users to correct goals, update robot motions, and recover from planning errors.
Our method makes it possible to compose multiple constraints and generalizes to unseen scenes, objects, and sentences in simulated environments and real-world environments.
arXiv Detail & Related papers (2022-04-11T15:22:43Z) - Learning Language-Conditioned Robot Behavior from Offline Data and
Crowd-Sourced Annotation [80.29069988090912]
We study the problem of learning a range of vision-based manipulation tasks from a large offline dataset of robot interaction.
We propose to leverage offline robot datasets with crowd-sourced natural language labels.
We find that our approach outperforms both goal-image specifications and language conditioned imitation techniques by more than 25%.
arXiv Detail & Related papers (2021-09-02T17:42:13Z) - Composing Pick-and-Place Tasks By Grounding Language [41.075844857146805]
We present a robot system that follows unconstrained language instructions to pick and place arbitrary objects.
Our approach infers objects and their relationships from input images and language expressions.
Results obtained using a real-world PR2 robot demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2021-02-16T11:29:09Z) - Translating Natural Language Instructions to Computer Programs for Robot
Manipulation [0.6629765271909505]
We propose translating the natural language instruction to a Python function which queries the scene by accessing the output of the object detector.
We show that the proposed method performs better than training a neural network to directly predict the robot actions.
arXiv Detail & Related papers (2020-12-26T07:57:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.