Related papers: Two Heads Are Better Than One: Collaborative LLM Embodied Agents for Human-Robot Interaction

Two Heads Are Better Than One: Collaborative LLM Embodied Agents for Human-Robot Interaction

URL: http://arxiv.org/abs/2411.16723v1
Date: Sat, 23 Nov 2024 02:47:12 GMT
Title: Two Heads Are Better Than One: Collaborative LLM Embodied Agents for Human-Robot Interaction
Authors: Mitchell Rosser, Marc. G Carmichael,
Abstract summary: Large language models (LLMs) should be able to leverage their large breadth of understanding to interpret natural language commands. However, these models suffer from hallucinations, which may cause safety issues or deviations from the task. In this research, multiple collaborative AI systems were tested against a single independent AI agent to determine whether the success in other domains would translate into improved human-robot interaction performance.
Score: 1.6574413179773757
License: http://creativecommons.org/licenses/by/4.0/
Abstract: With the recent development of natural language generation models - termed as large language models (LLMs) - a potential use case has opened up to improve the way that humans interact with robot assistants. These LLMs should be able to leverage their large breadth of understanding to interpret natural language commands into effective, task appropriate and safe robot task executions. However, in reality, these models suffer from hallucinations, which may cause safety issues or deviations from the task. In other domains, these issues have been improved through the use of collaborative AI systems where multiple LLM agents can work together to collectively plan, code and self-check outputs. In this research, multiple collaborative AI systems were tested against a single independent AI agent to determine whether the success in other domains would translate into improved human-robot interaction performance. The results show that there is no defined trend between the number of agents and the success of the model. However, it is clear that some collaborative AI agent architectures can exhibit a greatly improved capacity to produce error-free code and to solve abstract problems.

Related papers

Modeling AI-Human Collaboration as a Multi-Agent Adaptation [0.0]
We develop an agent-based simulation to formalize AI-human collaboration as a function of a task. We show that in modular tasks, AI often substitutes for humans - delivering higher payoffs unless human expertise is very high. We also show that even "hallucinatory" AI - lacking memory or structure - can improve outcomes when augmenting low-capability humans by helping escape local optima.
arXiv Detail & Related papers (2025-04-29T16:19:53Z)
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks [52.46737975742287]
We build a self-contained environment with data that mimics a small software company environment. We find that with the most competitive agent, 24% of the tasks can be completed autonomously. This paints a nuanced picture on task automation with LM agents.
arXiv Detail & Related papers (2024-12-18T18:55:40Z)
ChatCollab: Exploring Collaboration Between Humans and AI Agents in Software Teams [1.3967206132709542]
ChatCollab's novel architecture allows agents - human or AI - to join collaborations in any role. Using software engineering as a case study, we find that our AI agents successfully identify their roles and responsibilities. In relation to three prior multi-agent AI systems for software development, we find ChatCollab AI agents produce comparable or better software in an interactive game development task.
arXiv Detail & Related papers (2024-12-02T21:56:46Z)
$π_0$: A Vision-Language-Action Flow Model for General Robot Control [77.32743739202543]
We propose a novel flow matching architecture built on top of a pre-trained vision-language model (VLM) to inherit Internet-scale semantic knowledge. We evaluate our model in terms of its ability to perform tasks in zero shot after pre-training, follow language instructions from people, and its ability to acquire new skills via fine-tuning.
arXiv Detail & Related papers (2024-10-31T17:22:30Z)
WorkArena++: Towards Compositional Planning and Reasoning-based Common Knowledge Work Tasks [85.95607119635102]
Large language models (LLMs) can mimic human-like intelligence. WorkArena++ is designed to evaluate the planning, problem-solving, logical/arithmetic reasoning, retrieval, and contextual understanding abilities of web agents.
arXiv Detail & Related papers (2024-07-07T07:15:49Z)
Exploring Autonomous Agents through the Lens of Large Language Models: A Review [0.0]
Large Language Models (LLMs) are transforming artificial intelligence, enabling autonomous agents to perform diverse tasks across various domains. They face challenges such as multimodality, human value alignment, hallucinations, and evaluation. Evaluation platforms like AgentBench, WebArena, and ToolLLM provide robust methods for assessing these agents in complex scenarios.
arXiv Detail & Related papers (2024-04-05T22:59:02Z)
Large Language Models for Orchestrating Bimanual Robots [19.60907949776435]
We present LAnguage-model-based Bimanual ORchestration (LABOR) to analyze task configurations and devise coordination control policies. We evaluate our method through simulated experiments involving two classes of long-horizon tasks using the NICOL humanoid robot.
arXiv Detail & Related papers (2024-04-02T15:08:35Z)
Large Language Model-based Human-Agent Collaboration for Complex Task Solving [94.3914058341565]
We introduce the problem of Large Language Models (LLMs)-based human-agent collaboration for complex task-solving. We propose a Reinforcement Learning-based Human-Agent Collaboration method, ReHAC. This approach includes a policy model designed to determine the most opportune stages for human intervention within the task-solving process.
arXiv Detail & Related papers (2024-02-20T11:03:36Z)
The Rise and Potential of Large Language Model Based Agents: A Survey [91.71061158000953]
Large language models (LLMs) are regarded as potential sparks for Artificial General Intelligence (AGI) We start by tracing the concept of agents from its philosophical origins to its development in AI, and explain why LLMs are suitable foundations for agents. We explore the extensive applications of LLM-based agents in three aspects: single-agent scenarios, multi-agent scenarios, and human-agent cooperation.
arXiv Detail & Related papers (2023-09-14T17:12:03Z)
Building Cooperative Embodied Agents Modularly with Large Language Models [104.57849816689559]
We address challenging multi-agent cooperation problems with decentralized control, raw sensory observations, costly communication, and multi-objective tasks instantiated in various embodied environments. We harness the commonsense knowledge, reasoning ability, language comprehension, and text generation prowess of LLMs and seamlessly incorporate them into a cognitive-inspired modular framework. Our experiments on C-WAH and TDW-MAT demonstrate that CoELA driven by GPT-4 can surpass strong planning-based methods and exhibit emergent effective communication.
arXiv Detail & Related papers (2023-07-05T17:59:27Z)
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face [85.25054021362232]
Large language models (LLMs) have exhibited exceptional abilities in language understanding, generation, interaction, and reasoning. LLMs could act as a controller to manage existing AI models to solve complicated AI tasks. We present HuggingGPT, an LLM-powered agent that connects various AI models in machine learning communities.
arXiv Detail & Related papers (2023-03-30T17:48:28Z)
Chat with the Environment: Interactive Multimodal Perception Using Large Language Models [19.623070762485494]
Large Language Models (LLMs) have shown remarkable reasoning ability in few-shot robotic planning. Our study demonstrates that LLMs can provide high-level planning and reasoning skills and control interactive robot behavior in a multimodal environment.
arXiv Detail & Related papers (2023-03-14T23:01:27Z)
Creating Multimodal Interactive Agents with Imitation and Self-Supervised Learning [20.02604302565522]
A common vision from science fiction is that robots will one day inhabit our physical spaces, sense the world as we do, assist our physical labours, and communicate with us through natural language. Here we study how to design artificial agents that can interact naturally with humans using the simplification of a virtual environment. We show that imitation learning of human-human interactions in a simulated world, in conjunction with self-supervised learning, is sufficient to produce a multimodal interactive agent, which we call MIA, that successfully interacts with non-adversarial humans 75% of the time.
arXiv Detail & Related papers (2021-12-07T15:17:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.