Related papers: Challenges in Grounding Language in the Real World

Challenges in Grounding Language in the Real World

URL: http://arxiv.org/abs/2506.17375v1
Date: Fri, 20 Jun 2025 17:17:53 GMT
Title: Challenges in Grounding Language in the Real World
Authors: Peter Lindes, Kaoutar Skiker,
Abstract summary: A long-term goal of Artificial Intelligence is to build a language understanding system that allows a human to collaborate with a physical robot using language that is natural to the human.<n>We propose a solution that integrates the abilities of a cognitive agent capable of interactive task learning in a physical robot with the linguistic abilities of a large language model.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A long-term goal of Artificial Intelligence is to build a language understanding system that allows a human to collaborate with a physical robot using language that is natural to the human. In this paper we highlight some of the challenges in doing this, and propose a solution that integrates the abilities of a cognitive agent capable of interactive task learning in a physical robot with the linguistic abilities of a large language model. We also point the way to an initial implementation of this approach.

Related papers

Towards Human-level Intelligence via Human-like Whole-Body Manipulation [10.199110135230674]
We present Astribot Suite, a robot learning suite for whole-body manipulation aimed at general daily tasks across diverse environments.<n>Our results show that Astribot's cohesive integration of embodiment, teleoperation interface, and learning pipeline marks a significant step towards real-world, general-purpose whole-body robotic manipulation.
arXiv Detail & Related papers (2025-07-23T02:23:41Z)
$π_0$: A Vision-Language-Action Flow Model for General Robot Control [77.32743739202543]
We propose a novel flow matching architecture built on top of a pre-trained vision-language model (VLM) to inherit Internet-scale semantic knowledge. We evaluate our model in terms of its ability to perform tasks in zero shot after pre-training, follow language instructions from people, and its ability to acquire new skills via fine-tuning.
arXiv Detail & Related papers (2024-10-31T17:22:30Z)
tagE: Enabling an Embodied Agent to Understand Human Instructions [3.943519623674811]
We introduce a novel system known as task and argument grounding for Embodied agents (tagE) At its core, our system employs an inventive neural network model designed to extract a series of tasks from complex task instructions expressed in natural language. Our proposed model adopts an encoder-decoder framework enriched with nested decoding to effectively extract tasks and their corresponding arguments from these intricate instructions.
arXiv Detail & Related papers (2023-10-24T08:17:48Z)
Dobby: A Conversational Service Robot Driven by GPT-4 [22.701223191699412]
This work introduces a robotics platform which embeds a conversational AI agent in an embodied system for service tasks. The agent is derived from a large language model, which has learned from a vast corpus of general knowledge. In addition to generating dialogue, this agent can interface with the physical world by invoking commands on the robot.
arXiv Detail & Related papers (2023-10-10T04:34:00Z)
WALL-E: Embodied Robotic WAiter Load Lifting with Large Language Model [92.90127398282209]
This paper investigates the potential of integrating the most recent Large Language Models (LLMs) and existing visual grounding and robotic grasping system. We introduce the WALL-E (Embodied Robotic WAiter load lifting with Large Language model) as an example of this integration. We deploy this LLM-empowered system on the physical robot to provide a more user-friendly interface for the instruction-guided grasping task.
arXiv Detail & Related papers (2023-08-30T11:35:21Z)
Human-guided Collaborative Problem Solving: A Natural Language based Framework [74.27063862727849]
Our framework consists of three components -- a natural language engine that parses the language utterances to a formal representation and vice-versa. We illustrate the ability of this framework to address the key challenges of collaborative problem solving by demonstrating it on a collaborative building task in a Minecraft-based blocksworld domain.
arXiv Detail & Related papers (2022-07-19T21:52:37Z)
Human Heuristics for AI-Generated Language Are Flawed [8.465228064780744]
We study whether verbal self-presentations, one of the most personal and consequential forms of language, were generated by AI. We experimentally demonstrate that these wordings make human judgment of AI-generated language predictable and manipulable. We discuss solutions, such as AI accents, to reduce the deceptive potential of language generated by AI.
arXiv Detail & Related papers (2022-06-15T03:18:56Z)
Interactive Grounded Language Understanding in a Collaborative Environment: IGLU 2021 [58.196738777207315]
We propose emphIGLU: Interactive Grounded Language Understanding in a Collaborative Environment. The primary goal of the competition is to approach the problem of how to build interactive agents that learn to solve a task while provided with grounded natural language instructions in a collaborative environment.
arXiv Detail & Related papers (2022-05-05T01:20:09Z)
Do As I Can, Not As I Say: Grounding Language in Robotic Affordances [119.29555551279155]
Large language models can encode a wealth of semantic knowledge about the world. Such knowledge could be extremely useful to robots aiming to act upon high-level, temporally extended instructions expressed in natural language. We show how low-level skills can be combined with large language models so that the language model provides high-level knowledge about the procedures for performing complex and temporally-extended instructions.
arXiv Detail & Related papers (2022-04-04T17:57:11Z)
CALVIN: A Benchmark for Language-conditioned Policy Learning for Long-horizon Robot Manipulation Tasks [30.936692970187416]
General-purpose robots must learn to relate human language to their perceptions and actions. We present CALVIN, an open-source simulated benchmark to learn long-horizon language-conditioned tasks.
arXiv Detail & Related papers (2021-12-06T18:37:33Z)
Natural Language for Human-Robot Collaboration: Problems Beyond Language Grounding [10.227242085922613]
We identify several aspects of language processing that are not commonly studied in this context. These include location, planning, and generation. We suggest evaluations for each task, offer baselines for simple methods, and close by discussing challenges and opportunities in studying language for collaboration.
arXiv Detail & Related papers (2021-10-09T03:24:38Z)
Learning Language-Conditioned Robot Behavior from Offline Data and Crowd-Sourced Annotation [80.29069988090912]
We study the problem of learning a range of vision-based manipulation tasks from a large offline dataset of robot interaction. We propose to leverage offline robot datasets with crowd-sourced natural language labels. We find that our approach outperforms both goal-image specifications and language conditioned imitation techniques by more than 25%.
arXiv Detail & Related papers (2021-09-02T17:42:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.