Making Large Language Models into World Models with Precondition and Effect Knowledge
- URL: http://arxiv.org/abs/2409.12278v2
- Date: Wed, 2 Oct 2024 23:37:50 GMT
- Title: Making Large Language Models into World Models with Precondition and Effect Knowledge
- Authors: Kaige Xie, Ian Yang, John Gunerli, Mark Riedl,
- Abstract summary: We show that Large Language Models (LLMs) can be induced to perform two critical world model functions.
We validate that the precondition and effect knowledge generated by our models aligns with human understanding of world dynamics.
- Score: 1.8561812622368763
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: World models, which encapsulate the dynamics of how actions affect environments, are foundational to the functioning of intelligent agents. In this work, we explore the potential of Large Language Models (LLMs) to operate as world models. Although LLMs are not inherently designed to model real-world dynamics, we show that they can be induced to perform two critical world model functions: determining the applicability of an action based on a given world state, and predicting the resulting world state upon action execution. This is achieved by fine-tuning two separate LLMs-one for precondition prediction and another for effect prediction-while leveraging synthetic data generation techniques. Through human-participant studies, we validate that the precondition and effect knowledge generated by our models aligns with human understanding of world dynamics. We also analyze the extent to which the world model trained on our synthetic data results in an inferred state space that supports the creation of action chains, a necessary property for planning.
Related papers
- WHALE: Towards Generalizable and Scalable World Models for Embodied Decision-making [40.53824201182517]
This paper introduces WHALE, a framework for learning generalizable world models.
We present Whale-ST, a scalable spatial-temporal transformer-based world model with enhanced generalizability.
We also propose Whale-X, a 414M parameter world model trained on 970K trajectories from Open X-Embodiment datasets.
arXiv Detail & Related papers (2024-11-08T15:01:27Z) - On the Modeling Capabilities of Large Language Models for Sequential Decision Making [52.128546842746246]
Large pretrained models are showing increasingly better performance in reasoning and planning tasks.
We evaluate their ability to produce decision-making policies, either directly, by generating actions, or indirectly.
In environments with unfamiliar dynamics, we explore how fine-tuning LLMs with synthetic data can significantly improve their reward modeling capabilities.
arXiv Detail & Related papers (2024-10-08T03:12:57Z) - WorldGPT: Empowering LLM as Multimodal World Model [51.243464216500975]
We introduce WorldGPT, a generalist world model built upon Multimodal Large Language Model (MLLM)
WorldGPT acquires an understanding of world dynamics through analyzing millions of videos across various domains.
We conduct evaluations on WorldNet, a multimodal state transition prediction benchmark.
arXiv Detail & Related papers (2024-04-28T14:42:02Z) - Learning World Models With Hierarchical Temporal Abstractions: A Probabilistic Perspective [2.61072980439312]
Devising formalisms to develop internal world models is a critical research challenge in the domains of artificial intelligence and machine learning.
This thesis identifies several limitations with the prevalent use of state space models as internal world models.
The structure of models in formalisms facilitates exact probabilistic inference using belief propagation, as well as end-to-end learning via backpropagation through time.
These formalisms integrate the concept of uncertainty in world states, thus improving the system's capacity to emulate the nature of the real world and quantify the confidence in its predictions.
arXiv Detail & Related papers (2024-04-24T12:41:04Z) - Simplifying Latent Dynamics with Softly State-Invariant World Models [10.722955763425228]
We introduce the Parsimonious Latent Space Model (PLSM), a world model that regularizes the latent dynamics to make the effect of the agent's actions more predictable.
We find that our regularization improves accuracy, generalization, and performance in downstream tasks.
arXiv Detail & Related papers (2024-01-31T13:52:11Z) - Learning the Effects of Physical Actions in a Multi-modal Environment [17.757831697284498]
Large Language Models (LLMs) handle physical commonsense information inadequately.
We introduce the multi-modal task of predicting the outcomes of actions solely from realistic sensory inputs.
We show that multi-modal models can capture physical commonsense when augmented with visual information.
arXiv Detail & Related papers (2023-01-27T16:49:52Z) - Predictive World Models from Real-World Partial Observations [66.80340484148931]
We present a framework for learning a probabilistic predictive world model for real-world road environments.
While prior methods require complete states as ground truth for learning, we present a novel sequential training method to allow HVAEs to learn to predict complete states from partially observed states only.
arXiv Detail & Related papers (2023-01-12T02:07:26Z) - Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP)
What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining.
How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z) - Goal-Aware Prediction: Learning to Model What Matters [105.43098326577434]
One of the fundamental challenges in using a learned forward dynamics model is the mismatch between the objective of the learned model and that of the downstream planner or policy.
We propose to direct prediction towards task relevant information, enabling the model to be aware of the current task and encouraging it to only model relevant quantities of the state space.
We find that our method more effectively models the relevant parts of the scene conditioned on the goal, and as a result outperforms standard task-agnostic dynamics models and model-free reinforcement learning.
arXiv Detail & Related papers (2020-07-14T16:42:59Z) - Context-aware Dynamics Model for Generalization in Model-Based
Reinforcement Learning [124.9856253431878]
We decompose the task of learning a global dynamics model into two stages: (a) learning a context latent vector that captures the local dynamics, then (b) predicting the next state conditioned on it.
In order to encode dynamics-specific information into the context latent vector, we introduce a novel loss function that encourages the context latent vector to be useful for predicting both forward and backward dynamics.
The proposed method achieves superior generalization ability across various simulated robotics and control tasks, compared to existing RL schemes.
arXiv Detail & Related papers (2020-05-14T08:10:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.