SEAL: Semantic Frame Execution And Localization for Perceiving Afforded
Robot Actions
- URL: http://arxiv.org/abs/2303.14067v1
- Date: Fri, 24 Mar 2023 15:25:41 GMT
- Title: SEAL: Semantic Frame Execution And Localization for Perceiving Afforded
Robot Actions
- Authors: Cameron Kisailus, Daksh Narang, Matthew Shannon, Odest Chadwicke
Jenkins
- Abstract summary: We extend the semantic frame representation for robot manipulation actions and introduce the problem of Semantic Frame Execution And Localization for Perceiving Afforded Robot Actions (SEAL) as a graphical model.
For the SEAL problem, we describe our nonparametric Semantic Frame Mapping (SeFM) algorithm for maintaining belief over a finite set of semantic frames as the locations of actions afforded to the robot.
- Score: 5.522839151632667
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in robotic mobile manipulation have spurred the expansion of
the operating environment for robots from constrained workspaces to
large-scale, human environments. In order to effectively complete tasks in
these spaces, robots must be able to perceive, reason, and execute over a
diversity of affordances, well beyond simple pick-and-place. We posit the
notion of semantic frames provides a compelling representation for robot
actions that is amenable to action-focused perception, task-level reasoning,
action-level execution, and integration with language. Semantic frames, a
product of the linguistics community, define the necessary elements, pre- and
post- conditions, and a set of sequential robot actions necessary to
successfully execute an action evoked by a verb phrase. In this work, we extend
the semantic frame representation for robot manipulation actions and introduce
the problem of Semantic Frame Execution And Localization for Perceiving
Afforded Robot Actions (SEAL) as a graphical model. For the SEAL problem, we
describe our nonparametric Semantic Frame Mapping (SeFM) algorithm for
maintaining belief over a finite set of semantic frames as the locations of
actions afforded to the robot. We show that language models such as GPT-3 are
insufficient to address generalized task execution covered by the SEAL
formulation and SeFM provides robots with efficient search strategies and long
term memory needed when operating in building-scale environments.
Related papers
- Robots Can Multitask Too: Integrating a Memory Architecture and LLMs for Enhanced Cross-Task Robot Action Generation [13.181465089984567]
Large Language Models (LLMs) have been recently used in robot applications for grounding common-sense reasoning with the robot's perception and physical abilities.
In this paper, we address incorporating memory processes with LLMs for generating cross-task robot actions, while the robot effectively switches between tasks.
Our results show a significant improvement in performance over a baseline of five robotic tasks, demonstrating the potential of integrating memory with LLMs for combining the robot's action and perception for adaptive task execution.
arXiv Detail & Related papers (2024-07-18T13:38:21Z) - Enhancing the LLM-Based Robot Manipulation Through Human-Robot Collaboration [4.2460673279562755]
Large Language Models (LLMs) are gaining popularity in the field of robotics.
This paper proposes a novel approach to enhance the performance of LLM-based autonomous manipulation through Human-Robot Collaboration (HRC)
The approach involves using a prompted GPT-4 language model to decompose high-level language commands into sequences of motions that can be executed by the robot.
arXiv Detail & Related papers (2024-06-20T08:23:49Z) - RoboScript: Code Generation for Free-Form Manipulation Tasks across Real
and Simulation [77.41969287400977]
This paper presents textbfRobotScript, a platform for a deployable robot manipulation pipeline powered by code generation.
We also present a benchmark for a code generation benchmark for robot manipulation tasks in free-form natural language.
We demonstrate the adaptability of our code generation framework across multiple robot embodiments, including the Franka and UR5 robot arms.
arXiv Detail & Related papers (2024-02-22T15:12:00Z) - Interactive Planning Using Large Language Models for Partially
Observable Robotics Tasks [54.60571399091711]
Large Language Models (LLMs) have achieved impressive results in creating robotic agents for performing open vocabulary tasks.
We present an interactive planning technique for partially observable tasks using LLMs.
arXiv Detail & Related papers (2023-12-11T22:54:44Z) - WALL-E: Embodied Robotic WAiter Load Lifting with Large Language Model [92.90127398282209]
This paper investigates the potential of integrating the most recent Large Language Models (LLMs) and existing visual grounding and robotic grasping system.
We introduce the WALL-E (Embodied Robotic WAiter load lifting with Large Language model) as an example of this integration.
We deploy this LLM-empowered system on the physical robot to provide a more user-friendly interface for the instruction-guided grasping task.
arXiv Detail & Related papers (2023-08-30T11:35:21Z) - ProgPrompt: Generating Situated Robot Task Plans using Large Language
Models [68.57918965060787]
Large language models (LLMs) can be used to score potential next actions during task planning.
We present a programmatic LLM prompt structure that enables plan generation functional across situated environments.
arXiv Detail & Related papers (2022-09-22T20:29:49Z) - Can Foundation Models Perform Zero-Shot Task Specification For Robot
Manipulation? [54.442692221567796]
Task specification is critical for engagement of non-expert end-users and adoption of personalized robots.
A widely studied approach to task specification is through goals, using either compact state vectors or goal images from the same robot scene.
In this work, we explore alternate and more general forms of goal specification that are expected to be easier for humans to specify and use.
arXiv Detail & Related papers (2022-04-23T19:39:49Z) - Caption Generation of Robot Behaviors based on Unsupervised Learning of
Action Segments [10.356412004005767]
Bridging robot action sequences and their natural language captions is an important task to increase explainability of human assisting robots.
In this paper, we propose a system for generating natural language captions that describe behaviors of human assisting robots.
arXiv Detail & Related papers (2020-03-23T03:44:56Z) - SAPIEN: A SimulAted Part-based Interactive ENvironment [77.4739790629284]
SAPIEN is a realistic and physics-rich simulated environment that hosts a large-scale set for articulated objects.
We evaluate state-of-the-art vision algorithms for part detection and motion attribute recognition as well as demonstrate robotic interaction tasks.
arXiv Detail & Related papers (2020-03-19T00:11:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.