Related papers: Spatial Reasoning from Natural Language Instructions for Robot Manipulation

Spatial Reasoning from Natural Language Instructions for Robot Manipulation

URL: http://arxiv.org/abs/2012.13693v2
Date: Fri, 26 Mar 2021 15:24:57 GMT
Title: Spatial Reasoning from Natural Language Instructions for Robot Manipulation
Authors: Sagar Gubbi Venkatesh and Anirban Biswas and Raviteja Upadrashta and Vikram Srinivasan and Partha Talukdar and Bharadwaj Amrutur
Abstract summary: We propose a pipelined architecture of two stages to perform spatial reasoning on the text input. All the objects in the scene are first localized, and then the instruction for the robot in natural language and the localized co-ordinates are mapped to the start and end co-ordinates. The proposed method is used to pick-and-place playing cards using a robot arm.
Score: 0.5033155053523041
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Robots that can manipulate objects in unstructured environments and collaborate with humans can benefit immensely by understanding natural language. We propose a pipelined architecture of two stages to perform spatial reasoning on the text input. All the objects in the scene are first localized, and then the instruction for the robot in natural language and the localized co-ordinates are mapped to the start and end co-ordinates corresponding to the locations where the robot must pick up and place the object respectively. We show that representing the localized objects by quantizing their positions to a binary grid is preferable to representing them as a list of 2D co-ordinates. We also show that attention improves generalization and can overcome biases in the dataset. The proposed method is used to pick-and-place playing cards using a robot arm.

Related papers

SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation [49.858348469657784]
We introduce the concept of semantic orientation, which defines object orientations using natural language in a reference-frame-free manner. By integrating semantic orientation into a VLM system, we enable robots to generate manipulation actions with both positional and orientational constraints.
arXiv Detail & Related papers (2025-02-18T18:59:02Z)
Navigation with Large Language Models: Semantic Guesswork as a Heuristic for Planning [73.0990339667978]
Navigation in unfamiliar environments presents a major challenge for robots. We use language models to bias exploration of novel real-world environments. We evaluate LFG in challenging real-world environments and simulated benchmarks.
arXiv Detail & Related papers (2023-10-16T06:21:06Z)
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control [140.48218261864153]
We study how vision-language models trained on Internet-scale data can be incorporated directly into end-to-end robotic control. Our approach leads to performant robotic policies and enables RT-2 to obtain a range of emergent capabilities from Internet-scale training.
arXiv Detail & Related papers (2023-07-28T21:18:02Z)
Open-World Object Manipulation using Pre-trained Vision-Language Models [72.87306011500084]
For robots to follow instructions from people, they must be able to connect the rich semantic information in human vocabulary. We develop a simple approach, which leverages a pre-trained vision-language model to extract object-identifying information. In a variety of experiments on a real mobile manipulator, we find that MOO generalizes zero-shot to a wide range of novel object categories and environments.
arXiv Detail & Related papers (2023-03-02T01:55:10Z)
Enhancing Interpretability and Interactivity in Robot Manipulation: A Neurosymbolic Approach [0.0]
We present a neurosymbolic architecture for coupling language-guided visual reasoning with robot manipulation. A non-expert human user can prompt the robot using unconstrained natural language, providing a referring expression (REF), a question (VQA) or a grasp action instruction. We generate a 3D vision-and-language synthetic dataset of tabletop scenes in a simulation environment to train our approach and perform extensive evaluations in both synthetic and real-world scenes.
arXiv Detail & Related papers (2022-10-03T12:21:45Z)
Extracting Zero-shot Common Sense from Large Language Models for Robot 3D Scene Understanding [25.270772036342688]
We introduce a novel method for leveraging common sense embedded within large language models for labelling rooms. The proposed algorithm operates on 3D scene graphs produced by modern spatial perception systems.
arXiv Detail & Related papers (2022-06-09T16:05:35Z)
Correcting Robot Plans with Natural Language Feedback [88.92824527743105]
We explore natural language as an expressive and flexible tool for robot correction. We show that these transformations enable users to correct goals, update robot motions, and recover from planning errors. Our method makes it possible to compose multiple constraints and generalizes to unseen scenes, objects, and sentences in simulated environments and real-world environments.
arXiv Detail & Related papers (2022-04-11T15:22:43Z)
Learning Language-Conditioned Robot Behavior from Offline Data and Crowd-Sourced Annotation [80.29069988090912]
We study the problem of learning a range of vision-based manipulation tasks from a large offline dataset of robot interaction. We propose to leverage offline robot datasets with crowd-sourced natural language labels. We find that our approach outperforms both goal-image specifications and language conditioned imitation techniques by more than 25%.
arXiv Detail & Related papers (2021-09-02T17:42:13Z)
Composing Pick-and-Place Tasks By Grounding Language [41.075844857146805]
We present a robot system that follows unconstrained language instructions to pick and place arbitrary objects. Our approach infers objects and their relationships from input images and language expressions. Results obtained using a real-world PR2 robot demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2021-02-16T11:29:09Z)
Translating Natural Language Instructions to Computer Programs for Robot Manipulation [0.6629765271909505]
We propose translating the natural language instruction to a Python function which queries the scene by accessing the output of the object detector. We show that the proposed method performs better than training a neural network to directly predict the robot actions.
arXiv Detail & Related papers (2020-12-26T07:57:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.