Understanding Natural Language in Context
- URL: http://arxiv.org/abs/2205.12691v1
- Date: Wed, 25 May 2022 11:52:16 GMT
- Title: Understanding Natural Language in Context
- Authors: Avichai Levy, Erez Karpas
- Abstract summary: We focus on cognitive robots, which have some knowledge-based models of the world and operate by reasoning and planning with this model.
Our goal in this research is to translate natural language utterances into this robot's formalism.
We do so by combining off-the-shelf SOTA language models, planning tools, and the robot's knowledge-base for better communication.
- Score: 13.112390442564442
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent years have seen an increasing number of applications that have a
natural language interface, either in the form of chatbots or via personal
assistants such as Alexa (Amazon), Google Assistant, Siri (Apple), and Cortana
(Microsoft). To use these applications, a basic dialog between the robot and
the human is required.
While this kind of dialog exists today mainly within "static" robots that do
not make any movement in the household space, the challenge of reasoning about
the information conveyed by the environment increases significantly when
dealing with robots that can move and manipulate objects in our home
environment.
In this paper, we focus on cognitive robots, which have some knowledge-based
models of the world and operate by reasoning and planning with this model.
Thus, when the robot and the human communicate, there is already some formalism
they can use - the robot's knowledge representation formalism.
Our goal in this research is to translate natural language utterances into
this robot's formalism, allowing much more complicated household tasks to be
completed. We do so by combining off-the-shelf SOTA language models, planning
tools, and the robot's knowledge-base for better communication. In addition, we
analyze different directive types and illustrate the contribution of the
world's context to the translation process.
Related papers
- Project Report: Requirements for a Social Robot as an Information Provider in the Public Sector [0.0]
We have devised an application scenario for integrating a humanoid social robot into an official environment.
We developed a corresponding robot application and carried out initial tests and evaluations in a project together with the Kiel City Council.
One of the most important insights gained in the project was that a humanoid robot with natural language processing capabilities proved to be much more preferred by users.
We propose a connection of the ACT-R cognitive architecture with the robot, where an ACT-R model is used in interaction with the robot application to cognitively process and enhance a dialogue between human and robot.
arXiv Detail & Related papers (2024-12-06T13:07:06Z) - $π_0$: A Vision-Language-Action Flow Model for General Robot Control [77.32743739202543]
We propose a novel flow matching architecture built on top of a pre-trained vision-language model (VLM) to inherit Internet-scale semantic knowledge.
We evaluate our model in terms of its ability to perform tasks in zero shot after pre-training, follow language instructions from people, and its ability to acquire new skills via fine-tuning.
arXiv Detail & Related papers (2024-10-31T17:22:30Z) - Exploring Large Language Models to Facilitate Variable Autonomy for Human-Robot Teaming [4.779196219827508]
We introduce a novel framework for a GPT-powered multi-robot testbed environment, based on a Unity Virtual Reality (VR) setting.
This system allows users to interact with robot agents through natural language, each powered by individual GPT cores.
A user study with 12 participants explores the effectiveness of GPT-4 and, more importantly, user strategies when being given the opportunity to converse in natural language within a multi-robot environment.
arXiv Detail & Related papers (2023-12-12T12:26:48Z) - Dobby: A Conversational Service Robot Driven by GPT-4 [22.701223191699412]
This work introduces a robotics platform which embeds a conversational AI agent in an embodied system for service tasks.
The agent is derived from a large language model, which has learned from a vast corpus of general knowledge.
In addition to generating dialogue, this agent can interface with the physical world by invoking commands on the robot.
arXiv Detail & Related papers (2023-10-10T04:34:00Z) - HandMeThat: Human-Robot Communication in Physical and Social
Environments [73.91355172754717]
HandMeThat is a benchmark for a holistic evaluation of instruction understanding and following in physical and social environments.
HandMeThat contains 10,000 episodes of human-robot interactions.
We show that both offline and online reinforcement learning algorithms perform poorly on HandMeThat.
arXiv Detail & Related papers (2023-10-05T16:14:46Z) - WALL-E: Embodied Robotic WAiter Load Lifting with Large Language Model [92.90127398282209]
This paper investigates the potential of integrating the most recent Large Language Models (LLMs) and existing visual grounding and robotic grasping system.
We introduce the WALL-E (Embodied Robotic WAiter load lifting with Large Language model) as an example of this integration.
We deploy this LLM-empowered system on the physical robot to provide a more user-friendly interface for the instruction-guided grasping task.
arXiv Detail & Related papers (2023-08-30T11:35:21Z) - Open-World Object Manipulation using Pre-trained Vision-Language Models [72.87306011500084]
For robots to follow instructions from people, they must be able to connect the rich semantic information in human vocabulary.
We develop a simple approach, which leverages a pre-trained vision-language model to extract object-identifying information.
In a variety of experiments on a real mobile manipulator, we find that MOO generalizes zero-shot to a wide range of novel object categories and environments.
arXiv Detail & Related papers (2023-03-02T01:55:10Z) - Do As I Can, Not As I Say: Grounding Language in Robotic Affordances [119.29555551279155]
Large language models can encode a wealth of semantic knowledge about the world.
Such knowledge could be extremely useful to robots aiming to act upon high-level, temporally extended instructions expressed in natural language.
We show how low-level skills can be combined with large language models so that the language model provides high-level knowledge about the procedures for performing complex and temporally-extended instructions.
arXiv Detail & Related papers (2022-04-04T17:57:11Z) - Learning Language-Conditioned Robot Behavior from Offline Data and
Crowd-Sourced Annotation [80.29069988090912]
We study the problem of learning a range of vision-based manipulation tasks from a large offline dataset of robot interaction.
We propose to leverage offline robot datasets with crowd-sourced natural language labels.
We find that our approach outperforms both goal-image specifications and language conditioned imitation techniques by more than 25%.
arXiv Detail & Related papers (2021-09-02T17:42:13Z) - Spoken Language Interaction with Robots: Research Issues and
Recommendations, Report from the NSF Future Directions Workshop [0.819605661841562]
Meeting human needs requires addressing new challenges in speech technology and user experience design.
More powerful adaptation methods are needed, without extensive re-engineering or the collection of massive training data.
Since robots operate in real time, their speech and language processing components must also.
arXiv Detail & Related papers (2020-11-11T03:45:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.