WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents
- URL: http://arxiv.org/abs/2504.15785v1
- Date: Tue, 22 Apr 2025 10:58:27 GMT
- Title: WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents
- Authors: Siyu Zhou, Tianyi Zhou, Yijun Yang, Guodong Long, Deheng Ye, Jing Jiang, Chengqi Zhang,
- Abstract summary: We propose a training-free "world alignment" that learns an environment's symbolic knowledge complementary to large language models (LLMs)<n>We also propose an RL-free, model-based agent "WALL-E 2.0" through the model-predictive control framework.<n> WALL-E 2.0 significantly outperforms existing methods on open-world challenges in Mars (Minecraft like) and ALFWorld (embodied indoor environments)
- Score: 55.64361927346957
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Can we build accurate world models out of large language models (LLMs)? How can world models benefit LLM agents? The gap between the prior knowledge of LLMs and the specified environment's dynamics usually bottlenecks LLMs' performance as world models. To bridge the gap, we propose a training-free "world alignment" that learns an environment's symbolic knowledge complementary to LLMs. The symbolic knowledge covers action rules, knowledge graphs, and scene graphs, which are extracted by LLMs from exploration trajectories and encoded into executable codes to regulate LLM agents' policies. We further propose an RL-free, model-based agent "WALL-E 2.0" through the model-predictive control (MPC) framework. Unlike classical MPC requiring costly optimization on the fly, we adopt an LLM agent as an efficient look-ahead optimizer of future steps' actions by interacting with the neurosymbolic world model. While the LLM agent's strong heuristics make it an efficient planner in MPC, the quality of its planned actions is also secured by the accurate predictions of the aligned world model. They together considerably improve learning efficiency in a new environment. On open-world challenges in Mars (Minecraft like) and ALFWorld (embodied indoor environments), WALL-E 2.0 significantly outperforms existing methods, e.g., surpassing baselines in Mars by 16.1%-51.6% of success rate and by at least 61.7% in score. In ALFWorld, it achieves a new record 98% success rate after only 4 iterations.
Related papers
- ALU: Agentic LLM Unlearning [9.934258340998047]
Information removal or suppression in large language models (LLMs) is a desired functionality, useful in AI regulation, legal compliance, safety, and privacy.<n>Current LLM unlearning methods struggle to balance the unlearning efficacy and utility due to the competing nature of these objectives.<n>We present the first agentic LLM unlearning (ALU) method, a multi-agent, retrain-free, model-agnostic approach to LLM unlearning.
arXiv Detail & Related papers (2025-02-01T11:45:44Z) - WALL-E: World Alignment by Rule Learning Improves World Model-based LLM Agents [55.64361927346957]
We propose a neurosymbolic approach to learn rules gradient-free through large language models (LLMs)
Our embodied LLM agent "WALL-E" is built upon model-predictive control (MPC)
On open-world challenges in Minecraft and ALFWorld, WALL-E achieves higher success rates than existing methods.
arXiv Detail & Related papers (2024-10-09T23:37:36Z) - Mixture-of-Agents Enhances Large Language Model Capabilities [34.68610100315386]
We propose a new approach to harness the collective expertise of multiple large language models (LLMs) through a Mixture-of-Agents (MoA) methodology.
In our approach, we construct a layered MoA architecture wherein each layer comprises multiple LLM agents.
MoA models achieves state-of-art performance on AlpacaEval 2.0, MT-Bench and FLASK, surpassing GPT-4 Omni.
arXiv Detail & Related papers (2024-06-07T07:04:10Z) - From Words to Actions: Unveiling the Theoretical Underpinnings of LLM-Driven Autonomous Systems [59.40480894948944]
Large language model (LLM) empowered agents are able to solve decision-making problems in the physical world.
Under this model, the LLM Planner navigates a partially observable Markov decision process (POMDP) by iteratively generating language-based subgoals via prompting.
We prove that the pretrained LLM Planner effectively performs Bayesian aggregated imitation learning (BAIL) through in-context learning.
arXiv Detail & Related papers (2024-05-30T09:42:54Z) - Knowledge Fusion of Large Language Models [73.28202188100646]
This paper introduces the notion of knowledge fusion for large language models (LLMs)
We externalize their collective knowledge and unique strengths, thereby elevating the capabilities of the target model beyond those of any individual source LLM.
Our findings confirm that the fusion of LLMs can improve the performance of the target model across a range of capabilities such as reasoning, commonsense, and code generation.
arXiv Detail & Related papers (2024-01-19T05:02:46Z) - Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models [52.98743860365194]
We propose a new fine-tuning method called Self-Play fIne-tuNing (SPIN)
At the heart of SPIN lies a self-play mechanism, where the LLM refines its capability by playing against instances of itself.
This sheds light on the promise of self-play, enabling the achievement of human-level performance in LLMs without the need for expert opponents.
arXiv Detail & Related papers (2024-01-02T18:53:13Z) - LLM Augmented Hierarchical Agents [4.574041097539858]
Solving long-horizon, temporally-extended tasks using Reinforcement Learning (RL) is challenging, compounded by the common practice of learning without prior knowledge (or tabula rasa learning)
In this paper we exploit the planning capabilities of LLMs while using RL to provide learning from the environment, resulting in a hierarchical agent that uses LLMs to solve long-horizon tasks.
This approach is evaluated in simulation environments such as MiniGrid, SkillHack, and Crafter, and on a real robot arm in block manipulation tasks.
arXiv Detail & Related papers (2023-11-09T18:54:28Z) - Do Embodied Agents Dream of Pixelated Sheep: Embodied Decision Making
using Language Guided World Modelling [101.59430768507997]
Reinforcement learning (RL) agents typically learn tabula rasa, without prior knowledge of the world.
We propose using few-shot large language models (LLMs) to hypothesize an Abstract World Model (AWM)
Our method of hypothesizing an AWM with LLMs and then verifying the AWM based on agent experience not only increases sample efficiency over contemporary methods by an order of magnitude.
arXiv Detail & Related papers (2023-01-28T02:04:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.