Related papers: MindForge: Empowering Embodied Agents with Theory of Mind for Lifelong Collaborative Learning

MindForge: Empowering Embodied Agents with Theory of Mind for Lifelong Collaborative Learning

URL: http://arxiv.org/abs/2411.12977v2
Date: Mon, 25 Nov 2024 13:17:01 GMT
Title: MindForge: Empowering Embodied Agents with Theory of Mind for Lifelong Collaborative Learning
Authors: Mircea Lică, Ojas Shirekar, Baptiste Colle, Chirag Raman,
Abstract summary: collabvoyager is a novel framework that enhances Voyager with lifelong collaborative learning through explicit perspective-taking. collabvoyager introduces three key innovations: (1) theory of mind representations linking percepts, beliefs, desires, and actions; (2) natural language communication between agents; and (3) semantic memory of task and environment knowledge. In mixed-expertise Minecraft experiments, collabvoyager agents outperform Voyager counterparts, significantly improving task completion rate by $66.6% (+39.4%)$ for collecting one block of dirt and $70.8% (+20.8%)$ for
Score: 3.187381965457262
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Contemporary embodied agents, such as Voyager in Minecraft, have demonstrated promising capabilities in open-ended individual learning. However, when powered with open large language models (LLMs), these agents often struggle with rudimentary tasks, even when fine-tuned on domain-specific knowledge. Inspired by human cultural learning, we present \collabvoyager, a novel framework that enhances Voyager with lifelong collaborative learning through explicit perspective-taking. \collabvoyager introduces three key innovations: (1) theory of mind representations linking percepts, beliefs, desires, and actions; (2) natural language communication between agents; and (3) semantic memory of task and environment knowledge and episodic memory of collaboration episodes. These advancements enable agents to reason about their and others' mental states, empirically addressing two prevalent failure modes: false beliefs and faulty task executions. In mixed-expertise Minecraft experiments, \collabvoyager agents outperform Voyager counterparts, significantly improving task completion rate by $66.6\% (+39.4\%)$ for collecting one block of dirt and $70.8\% (+20.8\%)$ for collecting one wood block. They exhibit emergent behaviors like knowledge transfer from expert to novice agents and collaborative code correction. \collabvoyager agents also demonstrate the ability to adapt to out-of-distribution tasks by using their previous experiences and beliefs obtained through collaboration. In this open-ended social learning paradigm, \collabvoyager paves the way for the democratic development of embodied AI, where agents learn in deployment from both peer and environmental feedback.

Related papers

Optimus-3: Towards Generalist Multimodal Minecraft Agents with Scalable Task Experts [54.21319853862452]
We present Optimus-3, a general-purpose agent for Minecraft.<n>We propose a knowledge-enhanced data generation pipeline to provide scalable and high-quality training data for agent development.<n>We develop a Multimodal Reasoning-Augmented Reinforcement Learning approach to enhance the agent's reasoning ability for visual diversity.
arXiv Detail & Related papers (2025-06-12T05:29:40Z)
Don't lie to your friends: Learning what you know from collaborative self-play [90.35507959579331]
We propose a radically new approach to teaching AI agents what they know. We construct multi-agent collaborations in which the group is rewarded for collectively arriving at correct answers. The desired meta-knowledge emerges from the incentives built into the structure of the interaction.
arXiv Detail & Related papers (2025-03-18T17:53:20Z)
Unified Mind Model: Reimagining Autonomous Agents in the LLM Era [1.3812010983144802]
Large language models (LLMs) have recently demonstrated remarkable capabilities across domains, tasks, and languages. We propose a novel theoretical cognitive architecture, the Unified Mind Model (UMM), which offers guidance to facilitate the rapid creation of autonomous agents.
arXiv Detail & Related papers (2025-03-05T12:49:44Z)
ADAM: An Embodied Causal Agent in Open-World Environments [3.2474668680608314]
We introduce ADAM, an emboDied causal agent in Minecraft. ADAM can autonomously navigate the open world, perceive multimodal contexts, learn causal world knowledge, and tackle complex tasks through lifelong learning.
arXiv Detail & Related papers (2024-10-29T16:32:01Z)
Odyssey: Empowering Minecraft Agents with Open-World Skills [26.537984734738764]
We introduce Odyssey, a new framework that empowers Large Language Model (LLM)-based agents with open-world skills to explore the vast Minecraft world. Odyssey comprises three key parts: (1) An interactive agent with an open-world skill library that consists of 40 primitive skills and 183 compositional skills; (2) A fine-tuned LLaMA-3 model trained on a large question-answering dataset with 390k+ instruction entries derived from the Minecraft Wiki; and (3) A new agent capability benchmark.
arXiv Detail & Related papers (2024-07-22T02:06:59Z)
Generative agents in the streets: Exploring the use of Large Language Models (LLMs) in collecting urban perceptions [0.0]
This study explores the current advancements in Generative agents powered by large language models (LLMs) The experiment employs Generative agents to interact with the urban environments using street view images to plan their journey toward specific goals. Since LLMs do not possess embodiment, nor have access to the visual realm, and lack a sense of motion or direction, we designed movement and visual modules that help agents gain an overall understanding of surroundings.
arXiv Detail & Related papers (2023-12-20T15:45:54Z)
See and Think: Embodied Agent in Virtual Environment [12.801720916220823]
Large language models (LLMs) have achieved impressive pro-gress on several open-world tasks. This paper proposes STEVE, a comprehensive and visionary embodied agent in the Minecraft virtual environment.
arXiv Detail & Related papers (2023-11-26T06:38:16Z)
MAgIC: Investigation of Large Language Model Powered Multi-Agent in Cognition, Adaptability, Rationality and Collaboration [98.18244218156492]
Large Language Models (LLMs) have significantly advanced natural language processing.<n>As their applications expand into multi-agent environments, there arises a need for a comprehensive evaluation framework.<n>This work introduces a novel competition-based benchmark framework to assess LLMs within multi-agent settings.
arXiv Detail & Related papers (2023-11-14T21:46:27Z)
Emergence of Collective Open-Ended Exploration from Decentralized Meta-Reinforcement Learning [2.296343533657165]
Recent works have proven that intricate cooperative behaviors can emerge in agents trained using meta reinforcement learning on open ended task distributions using self-play. We argue that self-play and other centralized training techniques do not accurately reflect how general collective exploration strategies emerge in the natural world.
arXiv Detail & Related papers (2023-11-01T16:56:44Z)
Unleashing the Emergent Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration [116.09561564489799]
Solo Performance Prompting transforms a single LLM into a cognitive synergist by engaging in multi-turn self-collaboration with multiple personas. A cognitive synergist is an intelligent agent that collaboratively combines multiple minds' strengths and knowledge to enhance problem-solving in complex tasks. Our in-depth analysis shows that assigning multiple fine-grained personas in LLMs improves problem-solving abilities compared to using a single or fixed number of personas.
arXiv Detail & Related papers (2023-07-11T14:45:19Z)
Building Cooperative Embodied Agents Modularly with Large Language Models [104.57849816689559]
We address challenging multi-agent cooperation problems with decentralized control, raw sensory observations, costly communication, and multi-objective tasks instantiated in various embodied environments. We harness the commonsense knowledge, reasoning ability, language comprehension, and text generation prowess of LLMs and seamlessly incorporate them into a cognitive-inspired modular framework. Our experiments on C-WAH and TDW-MAT demonstrate that CoELA driven by GPT-4 can surpass strong planning-based methods and exhibit emergent effective communication.
arXiv Detail & Related papers (2023-07-05T17:59:27Z)
Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory [97.87093169454431]
Ghost in the Minecraft (GITM) is a novel framework that integrates Large Language Models (LLMs) with text-based knowledge and memory. We develop a set of structured actions and leverage LLMs to generate action plans for the agents to execute. The resulting LLM-based agent markedly surpasses previous methods, achieving a remarkable improvement of +47.5% in success rate.
arXiv Detail & Related papers (2023-05-25T17:59:49Z)
Voyager: An Open-Ended Embodied Agent with Large Language Models [103.76509266014165]
Voyager is the first embodied lifelong learning agent in Minecraft. It continuously explores the world, acquires diverse skills, and makes novel discoveries without human intervention. Voyager is able to utilize the learned skill library in a new Minecraft world to solve novel tasks from scratch.
arXiv Detail & Related papers (2023-05-25T17:46:38Z)
Generative Agents: Interactive Simulacra of Human Behavior [86.1026716646289]
We introduce generative agents--computational software agents that simulate believable human behavior. We describe an architecture that extends a large language model to store a complete record of the agent's experiences. We instantiate generative agents to populate an interactive sandbox environment inspired by The Sims.
arXiv Detail & Related papers (2023-04-07T01:55:19Z)
MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge [70.47759528596711]
We introduce MineDojo, a new framework built on the popular Minecraft game. We propose a novel agent learning algorithm that leverages large pre-trained video-language models as a learned reward function. Our agent is able to solve a variety of open-ended tasks specified in free-form language without any manually designed dense shaping reward.
arXiv Detail & Related papers (2022-06-17T15:53:05Z)
SKILL-IL: Disentangling Skill and Knowledge in Multitask Imitation Learning [21.222568055417717]
Humans are able to transfer skills and knowledge. If we can cycle to work and drive to the store, we can also cycle to the store and drive to work. We take inspiration from this and hypothesize the latent memory of a policy network can be disentangled into two partitions. These contain either the knowledge of the environmental context for the task or the generalizable skill needed to solve the task.
arXiv Detail & Related papers (2022-05-06T10:38:01Z)
Collaborative Training of Heterogeneous Reinforcement Learning Agents in Environments with Sparse Rewards: What and When to Share? [7.489793155793319]
This work focuses on combining information obtained through intrinsic motivation with the aim of having a more efficient exploration and faster learning. Our results reveal different ways in which a collaborative framework with little additional computational cost can outperform an independent learning process without knowledge sharing.
arXiv Detail & Related papers (2022-02-24T16:15:51Z)
Help Me Explore: Minimal Social Interventions for Graph-Based Autotelic Agents [7.644107117422287]
This paper argues that both perspectives could be coupled within the learning of autotelic agents to foster their skill acquisition. We make two contributions: 1) a novel social interaction protocol called Help Me Explore (HME), where autotelic agents can benefit from both individual and socially guided exploration. We show that when learning within HME, GANGSTR overcomes its individual learning limits by mastering the most complex configurations.
arXiv Detail & Related papers (2022-02-10T16:34:28Z)
Hidden Agenda: a Social Deduction Game with Diverse Learned Equilibria [57.74495091445414]
Social deduction games offer an avenue to study how individuals might learn to synthesize potentially unreliable information about others. In this work, we present Hidden Agenda, a two-team social deduction game that provides a 2D environment for studying learning agents in scenarios of unknown team alignment. Reinforcement learning agents trained in Hidden Agenda show that agents can learn a variety of behaviors, including partnering and voting without need for communication in natural language.
arXiv Detail & Related papers (2022-01-05T20:54:10Z)
Interpretation of Emergent Communication in Heterogeneous Collaborative Embodied Agents [83.52684405389445]
We introduce the collaborative multi-object navigation task CoMON. In this task, an oracle agent has detailed environment information in the form of a map. It communicates with a navigator agent that perceives the environment visually and is tasked to find a sequence of goals. We show that the emergent communication can be grounded to the agent observations and the spatial structure of the 3D environment.
arXiv Detail & Related papers (2021-10-12T06:56:11Z)
HALMA: Humanlike Abstraction Learning Meets Affordance in Rapid Problem Solving [104.79156980475686]
Humans learn compositional and causal abstraction, ie, knowledge, in response to the structure of naturalistic tasks. We argue there shall be three levels of generalization in how an agent represents its knowledge: perceptual, conceptual, and algorithmic. This benchmark is centered around a novel task domain, HALMA, for visual concept development and rapid problem-solving.
arXiv Detail & Related papers (2021-02-22T20:37:01Z)
Learning Affordance Landscapes for Interaction Exploration in 3D Environments [101.90004767771897]
Embodied agents must be able to master how their environment works. We introduce a reinforcement learning approach for exploration for interaction. We demonstrate our idea with AI2-iTHOR.
arXiv Detail & Related papers (2020-08-21T00:29:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.