Related papers: Using Natural Language for Human-Robot Collaboration in the Real World

Using Natural Language for Human-Robot Collaboration in the Real World

URL: http://arxiv.org/abs/2508.11759v2
Date: Fri, 19 Sep 2025 17:12:21 GMT
Title: Using Natural Language for Human-Robot Collaboration in the Real World
Authors: Peter Lindes, Kaoutar Skiker,
Abstract summary: We have a vision of a day when autonomous robots can collaborate with humans as assistants in performing complex tasks in the physical world.<n>This vision includes that the robots will have the ability to communicate with their human collaborators using language that is natural to the humans.
Score: 0.9668407688201359
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We have a vision of a day when autonomous robots can collaborate with humans as assistants in performing complex tasks in the physical world. This vision includes that the robots will have the ability to communicate with their human collaborators using language that is natural to the humans. Traditional Interactive Task Learning (ITL) systems have some of this ability, but the language they can understand is very limited. The advent of large language models (LLMs) provides an opportunity to greatly improve the language understanding of robots, yet integrating the language abilities of LLMs with robots that operate in the real physical world is a challenging problem. In this chapter we first review briefly a few commercial robot products that work closely with humans, and discuss how they could be much better collaborators with robust language abilities. We then explore how an AI system with a cognitive agent that controls a physical robot at its core, interacts with both a human and an LLM, and accumulates situational knowledge through its experiences, can be a possible approach to reach that vision. We focus on three specific challenges of having the robot understand natural language, and present a simple proof-of-concept experiment using ChatGPT for each. Finally, we discuss what it will take to turn these simple experiments into an operational system where LLM-assisted language understanding is a part of an integrated robotic assistant that uses language to collaborate with humans.

Related papers

Challenges in Grounding Language in the Real World [0.0]
A long-term goal of Artificial Intelligence is to build a language understanding system that allows a human to collaborate with a physical robot using language that is natural to the human.<n>We propose a solution that integrates the abilities of a cognitive agent capable of interactive task learning in a physical robot with the linguistic abilities of a large language model.
arXiv Detail & Related papers (2025-06-20T17:17:53Z)
A Paragraph is All It Takes: Rich Robot Behaviors from Interacting, Trusted LLMs [2.4866349670733294]
Large Language Models (LLMs) are compact representations of all public knowledge of our physical environment and animal and human behaviors.<n>We show that rich robot behaviors and good performance could be achieved despite the robot's data fusion cycle running at only 1Hz.<n>The use of natural language for inter-LLM communication allowed the robot's reasoning and decision making to be directly observed by humans.<n>We suggest that by using natural language as the data bus among interacting AIs, and immutable public ledgers to store behavior constraints, it is possible to build robots that combine unexpectedly rich performance, upgradability, and
arXiv Detail & Related papers (2024-12-24T18:41:15Z)
$π_0$: A Vision-Language-Action Flow Model for General Robot Control [77.32743739202543]
We propose a novel flow matching architecture built on top of a pre-trained vision-language model (VLM) to inherit Internet-scale semantic knowledge. We evaluate our model in terms of its ability to perform tasks in zero shot after pre-training, follow language instructions from people, and its ability to acquire new skills via fine-tuning.
arXiv Detail & Related papers (2024-10-31T17:22:30Z)
Exploring Large Language Models to Facilitate Variable Autonomy for Human-Robot Teaming [4.779196219827508]
We introduce a novel framework for a GPT-powered multi-robot testbed environment, based on a Unity Virtual Reality (VR) setting. This system allows users to interact with robot agents through natural language, each powered by individual GPT cores. A user study with 12 participants explores the effectiveness of GPT-4 and, more importantly, user strategies when being given the opportunity to converse in natural language within a multi-robot environment.
arXiv Detail & Related papers (2023-12-12T12:26:48Z)
A Sign Language Recognition System with Pepper, Lightweight-Transformer, and LLM [0.9775599530257609]
This research explores using lightweight deep neural network architectures to enable the humanoid robot Pepper to understand American Sign Language (ASL) We introduce a lightweight and efficient model for ASL understanding optimized for embedded systems, ensuring rapid sign recognition while conserving computational resources. We tailor interactions to allow the Pepper Robot to generate natural Co-Speech Gesture responses, laying the foundation for more organic and intuitive humanoid-robot dialogues.
arXiv Detail & Related papers (2023-09-28T23:54:41Z)
WALL-E: Embodied Robotic WAiter Load Lifting with Large Language Model [92.90127398282209]
This paper investigates the potential of integrating the most recent Large Language Models (LLMs) and existing visual grounding and robotic grasping system. We introduce the WALL-E (Embodied Robotic WAiter load lifting with Large Language model) as an example of this integration. We deploy this LLM-empowered system on the physical robot to provide a more user-friendly interface for the instruction-guided grasping task.
arXiv Detail & Related papers (2023-08-30T11:35:21Z)
Natural Language Instructions for Intuitive Human Interaction with Robotic Assistants in Field Construction Work [4.223718588030052]
This paper proposes a framework to allow human workers to interact with construction robots based on natural language instructions. The proposed method consists of three stages: Natural Language Understanding (NLU), Information Mapping (IM), and Robot Control (RC)
arXiv Detail & Related papers (2023-07-09T15:02:34Z)
LLM as A Robotic Brain: Unifying Egocentric Memory and Control [77.0899374628474]
Embodied AI focuses on the study and development of intelligent systems that possess a physical or virtual embodiment (i.e. robots) Memory and control are the two essential parts of an embodied system and usually require separate frameworks to model each of them. We propose a novel framework called LLM-Brain: using Large-scale Language Model as a robotic brain to unify egocentric memory and control.
arXiv Detail & Related papers (2023-04-19T00:08:48Z)
Open-World Object Manipulation using Pre-trained Vision-Language Models [72.87306011500084]
For robots to follow instructions from people, they must be able to connect the rich semantic information in human vocabulary. We develop a simple approach, which leverages a pre-trained vision-language model to extract object-identifying information. In a variety of experiments on a real mobile manipulator, we find that MOO generalizes zero-shot to a wide range of novel object categories and environments.
arXiv Detail & Related papers (2023-03-02T01:55:10Z)
Do As I Can, Not As I Say: Grounding Language in Robotic Affordances [119.29555551279155]
Large language models can encode a wealth of semantic knowledge about the world. Such knowledge could be extremely useful to robots aiming to act upon high-level, temporally extended instructions expressed in natural language. We show how low-level skills can be combined with large language models so that the language model provides high-level knowledge about the procedures for performing complex and temporally-extended instructions.
arXiv Detail & Related papers (2022-04-04T17:57:11Z)
Joint Mind Modeling for Explanation Generation in Complex Human-Robot Collaborative Tasks [83.37025218216888]
We propose a novel explainable AI (XAI) framework for achieving human-like communication in human-robot collaborations. The robot builds a hierarchical mind model of the human user and generates explanations of its own mind as a form of communications. Results show that the generated explanations of our approach significantly improves the collaboration performance and user perception of the robot.
arXiv Detail & Related papers (2020-07-24T23:35:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.