Summarizing a virtual robot's past actions in natural language
- URL: http://arxiv.org/abs/2203.06671v1
- Date: Sun, 13 Mar 2022 15:00:46 GMT
- Title: Summarizing a virtual robot's past actions in natural language
- Authors: Chad DeChant and Daniel Bauer
- Abstract summary: We show how a popular dataset that matches robot actions with natural language descriptions designed for an instruction following task can be repurposed to serve as a training ground for robot action summarization work.
We propose and test several methods of learning to generate such summaries, starting from either egocentric video frames of the robot taking actions or intermediate text representations of the actions used by an automatic planner.
- Score: 0.3553493344868413
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose and demonstrate the task of giving natural language summaries of
the actions of a robotic agent in a virtual environment. We explain why such a
task is important, what makes it difficult, and discuss how it might be
addressed. To encourage others to work on this, we show how a popular existing
dataset that matches robot actions with natural language descriptions designed
for an instruction following task can be repurposed to serve as a training
ground for robot action summarization work. We propose and test several methods
of learning to generate such summaries, starting from either egocentric video
frames of the robot taking actions or intermediate text representations of the
actions used by an automatic planner. We provide quantitative and qualitative
evaluations of our results, which can serve as a baseline for future work.
Related papers
- Context-Aware Command Understanding for Tabletop Scenarios [1.7082212774297747]
This paper presents a novel hybrid algorithm designed to interpret natural human commands in tabletop scenarios.
By integrating multiple sources of information, including speech, gestures, and scene context, the system extracts actionable instructions for a robot.
We discuss the strengths and limitations of the system, with particular focus on how it handles multimodal command interpretation.
arXiv Detail & Related papers (2024-10-08T20:46:39Z) - Self-Explainable Affordance Learning with Embodied Caption [63.88435741872204]
We introduce Self-Explainable Affordance learning (SEA) with embodied caption.
SEA enables robots to articulate their intentions and bridge the gap between explainable vision-language caption and visual affordance learning.
We propose a novel model to effectively combine affordance grounding with self-explanation in a simple but efficient manner.
arXiv Detail & Related papers (2024-04-08T15:22:38Z) - Interactive Planning Using Large Language Models for Partially
Observable Robotics Tasks [54.60571399091711]
Large Language Models (LLMs) have achieved impressive results in creating robotic agents for performing open vocabulary tasks.
We present an interactive planning technique for partially observable tasks using LLMs.
arXiv Detail & Related papers (2023-12-11T22:54:44Z) - RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic
Control [140.48218261864153]
We study how vision-language models trained on Internet-scale data can be incorporated directly into end-to-end robotic control.
Our approach leads to performant robotic policies and enables RT-2 to obtain a range of emergent capabilities from Internet-scale training.
arXiv Detail & Related papers (2023-07-28T21:18:02Z) - Learning to Summarize and Answer Questions about a Virtual Robot's Past
Actions [3.088519122619879]
We demonstrate the task of learning to summarize and answer questions about a robot agent's past actions using natural language alone.
A single system with a large language model at its core is trained to both summarize and answer questions about action sequences given ego-centric video frames of a virtual robot and a question prompt.
arXiv Detail & Related papers (2023-06-16T15:47:24Z) - Learning Video-Conditioned Policies for Unseen Manipulation Tasks [83.2240629060453]
Video-conditioned Policy learning maps human demonstrations of previously unseen tasks to robot manipulation skills.
We learn our policy to generate appropriate actions given current scene observations and a video of the target task.
We validate our approach on a set of challenging multi-task robot manipulation environments and outperform state of the art.
arXiv Detail & Related papers (2023-05-10T16:25:42Z) - ProgPrompt: Generating Situated Robot Task Plans using Large Language
Models [68.57918965060787]
Large language models (LLMs) can be used to score potential next actions during task planning.
We present a programmatic LLM prompt structure that enables plan generation functional across situated environments.
arXiv Detail & Related papers (2022-09-22T20:29:49Z) - Learning Language-Conditioned Robot Behavior from Offline Data and
Crowd-Sourced Annotation [80.29069988090912]
We study the problem of learning a range of vision-based manipulation tasks from a large offline dataset of robot interaction.
We propose to leverage offline robot datasets with crowd-sourced natural language labels.
We find that our approach outperforms both goal-image specifications and language conditioned imitation techniques by more than 25%.
arXiv Detail & Related papers (2021-09-02T17:42:13Z) - Caption Generation of Robot Behaviors based on Unsupervised Learning of
Action Segments [10.356412004005767]
Bridging robot action sequences and their natural language captions is an important task to increase explainability of human assisting robots.
In this paper, we propose a system for generating natural language captions that describe behaviors of human assisting robots.
arXiv Detail & Related papers (2020-03-23T03:44:56Z) - Interactive Robot Training for Non-Markov Tasks [6.252236971703546]
We propose a Bayesian interactive robot training framework that allows the robot to learn from both demonstrations provided by a teacher.
We also present an active learning approach to identify the task execution with the most uncertain degree of acceptability.
We demonstrate the efficacy of our approach in a real-world setting through a user-study based on teaching a robot to set a dinner table.
arXiv Detail & Related papers (2020-03-04T18:19:05Z) - Autonomous Planning Based on Spatial Concepts to Tidy Up Home
Environments with Service Robots [5.739787445246959]
We propose a novel planning method that can efficiently estimate the order and positions of the objects to be tidied up by learning the parameters of a probabilistic generative model.
The model allows a robot to learn the distributions of the co-occurrence probability of the objects and places to tidy up using the multimodal sensor information collected in a tidied environment.
We evaluate the effectiveness of the proposed method by an experimental simulation that reproduces the conditions of the Tidy Up Here task of the World Robot Summit 2018 international robotics competition.
arXiv Detail & Related papers (2020-02-10T11:49:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.