Related papers: Learning to Summarize and Answer Questions about a Virtual Robot's Past Actions

Learning to Summarize and Answer Questions about a Virtual Robot's Past Actions

URL: http://arxiv.org/abs/2306.09922v1
Date: Fri, 16 Jun 2023 15:47:24 GMT
Title: Learning to Summarize and Answer Questions about a Virtual Robot's Past Actions
Authors: Chad DeChant, Iretiayo Akinola, Daniel Bauer
Abstract summary: We demonstrate the task of learning to summarize and answer questions about a robot agent's past actions using natural language alone. A single system with a large language model at its core is trained to both summarize and answer questions about action sequences given ego-centric video frames of a virtual robot and a question prompt.
Score: 3.088519122619879
License: http://creativecommons.org/licenses/by/4.0/
Abstract: When robots perform long action sequences, users will want to easily and reliably find out what they have done. We therefore demonstrate the task of learning to summarize and answer questions about a robot agent's past actions using natural language alone. A single system with a large language model at its core is trained to both summarize and answer questions about action sequences given ego-centric video frames of a virtual robot and a question prompt. To enable training of question answering, we develop a method to automatically generate English-language questions and answers about objects, actions, and the temporal order in which actions occurred during episodes of robot action in the virtual environment. Training one model to both summarize and answer questions enables zero-shot transfer of representations of objects learned through question answering to improved action summarization. % involving objects not seen in training to summarize.

Related papers

SECURE: Semantics-aware Embodied Conversation under Unawareness for Lifelong Robot Learning [17.125080112897102]
This paper addresses a challenging interactive task learning scenario where the robot is unaware of a concept that's key to solving the instructed task. We propose SECURE, an interactive task learning framework designed to solve such problems by fixing a deficient domain model using embodied conversation. Using SECURE, the robot not only learns from the user's corrective feedback when it makes a mistake, but it also learns to make strategic dialogue decisions for revealing useful evidence about novel concepts for solving the instructed task.
arXiv Detail & Related papers (2024-09-26T11:40:07Z)
Episodic Memory Verbalization using Hierarchical Representations of Life-Long Robot Experience [12.9617156851956]
We apply large pretrained models to verbalize short (several-minute-long) streams of episodic data. We derive a tree-like data structure from episodic memory (EM), with lower levels representing raw perception and proprioception data. We evaluate our method on simulated household robot data, human egocentric videos, and real-world robot recordings.
arXiv Detail & Related papers (2024-09-26T10:16:08Z)
Self-Explainable Affordance Learning with Embodied Caption [63.88435741872204]
We introduce Self-Explainable Affordance learning (SEA) with embodied caption. SEA enables robots to articulate their intentions and bridge the gap between explainable vision-language caption and visual affordance learning. We propose a novel model to effectively combine affordance grounding with self-explanation in a simple but efficient manner.
arXiv Detail & Related papers (2024-04-08T15:22:38Z)
PREDILECT: Preferences Delineated with Zero-Shot Language-based Reasoning in Reinforcement Learning [2.7387720378113554]
Preference-based reinforcement learning (RL) has emerged as a new field in robot learning. We use the zero-shot capabilities of a large language model (LLM) to reason from the text provided by humans. In both a simulated scenario and a user study, we reveal the effectiveness of our work by analyzing the feedback and its implications.
arXiv Detail & Related papers (2024-02-23T16:30:05Z)
Interactive Planning Using Large Language Models for Partially Observable Robotics Tasks [54.60571399091711]
Large Language Models (LLMs) have achieved impressive results in creating robotic agents for performing open vocabulary tasks. We present an interactive planning technique for partially observable tasks using LLMs.
arXiv Detail & Related papers (2023-12-11T22:54:44Z)
Learning Video-Conditioned Policies for Unseen Manipulation Tasks [83.2240629060453]
Video-conditioned Policy learning maps human demonstrations of previously unseen tasks to robot manipulation skills. We learn our policy to generate appropriate actions given current scene observations and a video of the target task. We validate our approach on a set of challenging multi-task robot manipulation environments and outperform state of the art.
arXiv Detail & Related papers (2023-05-10T16:25:42Z)
Open-World Object Manipulation using Pre-trained Vision-Language Models [72.87306011500084]
For robots to follow instructions from people, they must be able to connect the rich semantic information in human vocabulary. We develop a simple approach, which leverages a pre-trained vision-language model to extract object-identifying information. In a variety of experiments on a real mobile manipulator, we find that MOO generalizes zero-shot to a wide range of novel object categories and environments.
arXiv Detail & Related papers (2023-03-02T01:55:10Z)
Synthesis and Execution of Communicative Robotic Movements with Generative Adversarial Networks [59.098560311521034]
We focus on how to transfer on two different robotic platforms the same kinematics modulation that humans adopt when manipulating delicate objects. We choose to modulate the velocity profile adopted by the robots' end-effector, inspired by what humans do when transporting objects with different characteristics. We exploit a novel Generative Adversarial Network architecture, trained with human kinematics examples, to generalize over them and generate new and meaningful velocity profiles.
arXiv Detail & Related papers (2022-03-29T15:03:05Z)
Summarizing a virtual robot's past actions in natural language [0.3553493344868413]
We show how a popular dataset that matches robot actions with natural language descriptions designed for an instruction following task can be repurposed to serve as a training ground for robot action summarization work. We propose and test several methods of learning to generate such summaries, starting from either egocentric video frames of the robot taking actions or intermediate text representations of the actions used by an automatic planner.
arXiv Detail & Related papers (2022-03-13T15:00:46Z)
Lifelong Robotic Reinforcement Learning by Retaining Experiences [61.79346922421323]
Many multi-task reinforcement learning efforts assume the robot can collect data from all tasks at all times. In this work, we study a practical sequential multi-task RL problem motivated by the practical constraints of physical robotic systems. We derive an approach that effectively leverages the data and policies learned for previous tasks to cumulatively grow the robot's skill-set.
arXiv Detail & Related papers (2021-09-19T18:00:51Z)
Composing Pick-and-Place Tasks By Grounding Language [41.075844857146805]
We present a robot system that follows unconstrained language instructions to pick and place arbitrary objects. Our approach infers objects and their relationships from input images and language expressions. Results obtained using a real-world PR2 robot demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2021-02-16T11:29:09Z)
Caption Generation of Robot Behaviors based on Unsupervised Learning of Action Segments [10.356412004005767]
Bridging robot action sequences and their natural language captions is an important task to increase explainability of human assisting robots. In this paper, we propose a system for generating natural language captions that describe behaviors of human assisting robots.
arXiv Detail & Related papers (2020-03-23T03:44:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.