Related papers: Translating Natural Language Instructions to Computer Programs for Robot Manipulation

Translating Natural Language Instructions to Computer Programs for Robot Manipulation

URL: http://arxiv.org/abs/2012.13695v2
Date: Sat, 20 Mar 2021 07:33:27 GMT
Title: Translating Natural Language Instructions to Computer Programs for Robot Manipulation
Authors: Sagar Gubbi Venkatesh and Raviteja Upadrashta and Bharadwaj Amrutur
Abstract summary: We propose translating the natural language instruction to a Python function which queries the scene by accessing the output of the object detector. We show that the proposed method performs better than training a neural network to directly predict the robot actions.
Score: 0.6629765271909505
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: It is highly desirable for robots that work alongside humans to be able to understand instructions in natural language. Existing language conditioned imitation learning models directly predict the actuator commands from the image observation and the instruction text. Rather than directly predicting actuator commands, we propose translating the natural language instruction to a Python function which queries the scene by accessing the output of the object detector and controls the robot to perform the specified task. This enables the use of non-differentiable modules such as a constraint solver when computing commands to the robot. Moreover, the labels in this setup are significantly more informative computer programs that capture the intent of the expert rather than teleoperated demonstrations. We show that the proposed method performs better than training a neural network to directly predict the robot actions.

Related papers

LLaRA: Supercharging Robot Learning Data for Vision-Language Policy [56.505551117094534]
We introduce LLaRA: Large Language and Robotics Assistant, a framework that formulates robot action policy as visuo-textual conversations. First, we present an automated pipeline to generate conversation-style instruction tuning data for robots from existing behavior cloning datasets. We show that a VLM finetuned with a limited amount of such datasets can produce meaningful action decisions for robotic control.
arXiv Detail & Related papers (2024-06-28T17:59:12Z)
Incremental Learning of Humanoid Robot Behavior from Natural Interaction and Large Language Models [23.945922720555146]
We propose a system to achieve incremental learning of complex behavior from natural interaction. We integrate the system in the robot cognitive architecture of the humanoid robot ARMAR-6.
arXiv Detail & Related papers (2023-09-08T13:29:05Z)
Surfer: Progressive Reasoning with World Models for Robotic Manipulation [51.26109827779267]
We introduce a novel and simple robot manipulation framework, called Surfer. Surfer treats robot manipulation as a state transfer of the visual scene, and decouples it into two parts: action and scene. It is based on the world model, treats robot manipulation as a state transfer of the visual scene, and decouples it into two parts: action and scene.
arXiv Detail & Related papers (2023-06-20T07:06:04Z)
Learning Video-Conditioned Policies for Unseen Manipulation Tasks [83.2240629060453]
Video-conditioned Policy learning maps human demonstrations of previously unseen tasks to robot manipulation skills. We learn our policy to generate appropriate actions given current scene observations and a video of the target task. We validate our approach on a set of challenging multi-task robot manipulation environments and outperform state of the art.
arXiv Detail & Related papers (2023-05-10T16:25:42Z)
Open-World Object Manipulation using Pre-trained Vision-Language Models [72.87306011500084]
For robots to follow instructions from people, they must be able to connect the rich semantic information in human vocabulary. We develop a simple approach, which leverages a pre-trained vision-language model to extract object-identifying information. In a variety of experiments on a real mobile manipulator, we find that MOO generalizes zero-shot to a wide range of novel object categories and environments.
arXiv Detail & Related papers (2023-03-02T01:55:10Z)
"No, to the Right" -- Online Language Corrections for Robotic Manipulation via Shared Autonomy [70.45420918526926]
We present LILAC, a framework for incorporating and adapting to natural language corrections online during execution. Instead of discrete turn-taking between a human and robot, LILAC splits agency between the human and robot. We show that our corrections-aware approach obtains higher task completion rates, and is subjectively preferred by users.
arXiv Detail & Related papers (2023-01-06T15:03:27Z)
Instruction-driven history-aware policies for robotic manipulations [82.25511767738224]
We propose a unified transformer-based approach that takes into account multiple inputs. In particular, our transformer architecture integrates (i) natural language instructions and (ii) multi-view scene observations. We evaluate our method on the challenging RLBench benchmark and on a real-world robot.
arXiv Detail & Related papers (2022-09-11T16:28:25Z)
Reshaping Robot Trajectories Using Natural Language Commands: A Study of Multi-Modal Data Alignment Using Transformers [33.7939079214046]
We provide a flexible language-based interface for human-robot collaboration. We take advantage of recent advancements in the field of large language models to encode the user command. We train the model using imitation learning over a dataset containing robot trajectories modified by language commands.
arXiv Detail & Related papers (2022-03-25T01:36:56Z)
Learning Language-Conditioned Robot Behavior from Offline Data and Crowd-Sourced Annotation [80.29069988090912]
We study the problem of learning a range of vision-based manipulation tasks from a large offline dataset of robot interaction. We propose to leverage offline robot datasets with crowd-sourced natural language labels. We find that our approach outperforms both goal-image specifications and language conditioned imitation techniques by more than 25%.
arXiv Detail & Related papers (2021-09-02T17:42:13Z)
Language Conditioned Imitation Learning over Unstructured Data [9.69886122332044]
We present a method for incorporating free-form natural language conditioning into imitation learning. Our approach learns perception from pixels, natural language understanding, and multitask continuous control end-to-end as a single neural network. We show this dramatically improves language conditioned performance, while reducing the cost of language annotation to less than 1% of total data.
arXiv Detail & Related papers (2020-05-15T17:08:50Z)
Caption Generation of Robot Behaviors based on Unsupervised Learning of Action Segments [10.356412004005767]
Bridging robot action sequences and their natural language captions is an important task to increase explainability of human assisting robots. In this paper, we propose a system for generating natural language captions that describe behaviors of human assisting robots.
arXiv Detail & Related papers (2020-03-23T03:44:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.