World Models Unlock Optimal Foraging Strategies in Reinforcement Learning Agents
- URL: http://arxiv.org/abs/2512.12548v1
- Date: Sun, 14 Dec 2025 04:36:06 GMT
- Title: World Models Unlock Optimal Foraging Strategies in Reinforcement Learning Agents
- Authors: Yesid Fonseca, Manuel S. RĂos, Nicanor Quijano, Luis F. Giraldo,
- Abstract summary: We show that artificial foragers equipped with learned world models naturally converge to MVT-aligned strategies.<n>Compared with standard model-free RL agents, these model-based agents exhibit decision patterns similar to many of their biological counterparts.
- Score: 0.9332987715848716
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Patch foraging involves the deliberate and planned process of determining the optimal time to depart from a resource-rich region and investigate potentially more beneficial alternatives. The Marginal Value Theorem (MVT) is frequently used to characterize this process, offering an optimality model for such foraging behaviors. Although this model has been widely used to make predictions in behavioral ecology, discovering the computational mechanisms that facilitate the emergence of optimal patch-foraging decisions in biological foragers remains under investigation. Here, we show that artificial foragers equipped with learned world models naturally converge to MVT-aligned strategies. Using a model-based reinforcement learning agent that acquires a parsimonious predictive representation of its environment, we demonstrate that anticipatory capabilities, rather than reward maximization alone, drive efficient patch-leaving behavior. Compared with standard model-free RL agents, these model-based agents exhibit decision patterns similar to many of their biological counterparts, suggesting that predictive world models can serve as a foundation for more explainable and biologically grounded decision-making in AI systems. Overall, our findings highlight the value of ecological optimality principles for advancing interpretable and adaptive AI.
Related papers
- Optimistic World Models: Efficient Exploration in Model-Based Deep Reinforcement Learning [12.864604506942294]
We introduce Optimistic World Models (OWMs), a principled and scalable framework for optimistic exploration.<n>OWMs incorporate optimism directly into model learning by augmentation with an optimistic dynamics loss.<n>We instantiate OWMs within two state-of-the-art world model architectures, leading to Optimistic DreamerV3 and Optimistic STORM.
arXiv Detail & Related papers (2026-02-10T18:11:00Z) - Active inference and artificial reasoning [36.949648744325046]
This technical note considers the sampling of outcomes that provide the greatest amount of information about the structure of underlying world models.<n>We focus on the sample efficiency afforded by seeking outcomes that resolve the greatest uncertainty about the world model.
arXiv Detail & Related papers (2025-12-24T11:59:36Z) - Divergence Minimization Preference Optimization for Diffusion Model Alignment [66.31417479052774]
Divergence Minimization Preference Optimization (DMPO) is a principled method for aligning diffusion models by minimizing reverse KL divergence.<n>DMPO can consistently outperform or match existing techniques across different base models and test sets.
arXiv Detail & Related papers (2025-07-10T07:57:30Z) - Delphos: A reinforcement learning framework for assisting discrete choice model specification [0.0]
We introduce Delphos, a deep reinforcement learning framework for assisting the discrete choice model specification process.<n>In this setting, an agent learns to specify well-performing model candidates by choosing a sequence of modelling actions.<n>We evaluate Delphos on both simulated and empirical datasets, varying the size of the modelling space and the reward function.
arXiv Detail & Related papers (2025-06-06T15:40:16Z) - AI in a vat: Fundamental limits of efficient world modelling for agent sandboxing and interpretability [84.52205243353761]
Recent work proposes using world models to generate controlled virtual environments in which AI agents can be tested before deployment.<n>We investigate ways of simplifying world models that remain agnostic to the AI agent under evaluation.
arXiv Detail & Related papers (2025-04-06T20:35:44Z) - On the Reasoning Capacity of AI Models and How to Quantify It [0.0]
Large Language Models (LLMs) have intensified the debate surrounding the fundamental nature of their reasoning capabilities.<n>While achieving high performance on benchmarks such as GPQA and MMLU, these models exhibit limitations in more complex reasoning tasks.<n>We propose a novel phenomenological approach that goes beyond traditional accuracy metrics to probe the underlying mechanisms of model behavior.
arXiv Detail & Related papers (2025-01-23T16:58:18Z) - Latent Variable Representation for Reinforcement Learning [131.03944557979725]
It remains unclear theoretically and empirically how latent variable models may facilitate learning, planning, and exploration to improve the sample efficiency of model-based reinforcement learning.
We provide a representation view of the latent variable models for state-action value functions, which allows both tractable variational learning algorithm and effective implementation of the optimism/pessimism principle.
In particular, we propose a computationally efficient planning algorithm with UCB exploration by incorporating kernel embeddings of latent variable models.
arXiv Detail & Related papers (2022-12-17T00:26:31Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z) - Planning with Diffusion for Flexible Behavior Synthesis [125.24438991142573]
We consider what it would look like to fold as much of the trajectory optimization pipeline as possible into the modeling problem.
The core of our technical approach lies in a diffusion probabilistic model that plans by iteratively denoising trajectories.
arXiv Detail & Related papers (2022-05-20T07:02:03Z) - Non-Markovian Reinforcement Learning using Fractional Dynamics [3.000697999889031]
Reinforcement learning (RL) is a technique to learn the control policy for an agent that interacts with an environment.
In this paper, we propose a model-based RL technique for a system that has non-Markovian dynamics.
Such environments are common in many real-world applications such as in human physiology, biological systems, material science, and population dynamics.
arXiv Detail & Related papers (2021-07-29T07:35:13Z) - On the model-based stochastic value gradient for continuous
reinforcement learning [50.085645237597056]
We show that simple model-based agents can outperform state-of-the-art model-free agents in terms of both sample-efficiency and final reward.
Our findings suggest that model-based policy evaluation deserves closer attention.
arXiv Detail & Related papers (2020-08-28T17:58:29Z) - Goal-Directed Planning for Habituated Agents by Active Inference Using a
Variational Recurrent Neural Network [5.000272778136268]
This study shows that the predictive coding (PC) and active inference (AIF) frameworks can develop better generalization by learning a prior distribution in a low dimensional latent state space.
In our proposed model, learning is carried out by inferring optimal latent variables as well as synaptic weights for maximizing the evidence lower bound.
Our proposed model was evaluated with both simple and complex robotic tasks in simulation, which demonstrated sufficient generalization in learning with limited training data.
arXiv Detail & Related papers (2020-05-27T06:43:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.