Boosting MCTS with Free Energy Minimization
- URL: http://arxiv.org/abs/2501.13083v1
- Date: Wed, 22 Jan 2025 18:45:15 GMT
- Title: Boosting MCTS with Free Energy Minimization
- Authors: Mawaba Pascal Dao, Adrian Peter,
- Abstract summary: We propose a new planning framework that integrates Monte Carlo Tree Search (MCTS) with active inference objectives.
MCTS can be naturally extended to incorporate free energy minimization by blending expected rewards with information gain.
This synergy allows our planner to maintain coherent estimates of value and uncertainty throughout planning, without sacrificing computational tractability.
- Score: 0.0
- License:
- Abstract: Active Inference, grounded in the Free Energy Principle, provides a powerful lens for understanding how agents balance exploration and goal-directed behavior in uncertain environments. Here, we propose a new planning framework, that integrates Monte Carlo Tree Search (MCTS) with active inference objectives to systematically reduce epistemic uncertainty while pursuing extrinsic rewards. Our key insight is that MCTS already renowned for its search efficiency can be naturally extended to incorporate free energy minimization by blending expected rewards with information gain. Concretely, the Cross-Entropy Method (CEM) is used to optimize action proposals at the root node, while tree expansions leverage reward modeling alongside intrinsic exploration bonuses. This synergy allows our planner to maintain coherent estimates of value and uncertainty throughout planning, without sacrificing computational tractability. Empirically, we benchmark our planner on a diverse set of continuous control tasks, where it demonstrates performance gains over both standalone CEM and MCTS with random rollouts.
Related papers
- Monte Carlo Tree Diffusion for System 2 Planning [57.50512800900167]
We introduce Monte Carlo Tree Diffusion (MCTD), a novel framework that integrates the generative strength of diffusion models with the adaptive search capabilities of Monte Carlo Tree Search (MCTS)
MCTD achieves the benefits of MCTS such as controlling exploration-exploitation trade-offs within the diffusion framework.
arXiv Detail & Related papers (2025-02-11T02:51:42Z) - Lipschitz Lifelong Monte Carlo Tree Search for Mastering Non-Stationary Tasks [19.42056439537988]
This paper presents LiZero for Lipschitz lifelong planning using Monte Carlo Tree Search (MCTS)
We propose a novel concept of adaptive UCT (aUCT) to transfer knowledge from a source task to the exploration/exploitation of a new task.
Experiment results show that LiZero significantly outperforms existing MCTS and lifelong learning baselines in terms of much faster convergence to optimal rewards.
arXiv Detail & Related papers (2025-02-02T02:45:20Z) - Generalizing in Net-Zero Microgrids: A Study with Federated PPO and TRPO [5.195669033269619]
This work addresses the challenge of optimal energy management in microgrids through a collaborative and privacy-preserving framework.
We propose the FedTRPO methodology, which integrates Federated Learning (FL) and Trust Region Policy Optimization (TRPO) to manage distributed energy resources efficiently.
arXiv Detail & Related papers (2024-12-30T13:38:31Z) - Monte Carlo Planning for Stochastic Control on Constrained Markov Decision Processes [1.445706856497821]
This work defines an MDP framework, the textttSD-MDP, where we disentangle the causal structure of MDPs' transition and reward dynamics.
We derive theoretical guarantees on the estimation error of the value function under an optimal policy by allowing independent value estimation from Monte Carlo sampling.
arXiv Detail & Related papers (2024-06-23T16:22:40Z) - Sample-efficient Real-time Planning with Curiosity Cross-Entropy Method
and Contrastive Learning [21.995159117991278]
We propose Curiosity CEM, an improved version of the Cross-Entropy Method (CEM) algorithm for encouraging exploration via curiosity.
Our proposed method maximizes the sum of state-action Q values over the planning horizon, in which these Q values estimate the future extrinsic and intrinsic reward.
Experiments on image-based continuous control tasks from the DeepMind Control suite show that CCEM is by a large margin more sample-efficient than previous MBRL algorithms.
arXiv Detail & Related papers (2023-03-07T10:48:20Z) - When Demonstrations Meet Generative World Models: A Maximum Likelihood
Framework for Offline Inverse Reinforcement Learning [62.00672284480755]
This paper aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent.
Accurate models of expertise in executing a task has applications in safety-sensitive applications such as clinical decision making and autonomous driving.
arXiv Detail & Related papers (2023-02-15T04:14:20Z) - Exploiting Submodular Value Functions For Scaling Up Active Perception [60.81276437097671]
In active perception tasks, agent aims to select sensory actions that reduce uncertainty about one or more hidden variables.
Partially observable Markov decision processes (POMDPs) provide a natural model for such problems.
As the number of sensors available to the agent grows, the computational cost of POMDP planning grows exponentially.
arXiv Detail & Related papers (2020-09-21T09:11:36Z) - Reinforcement Learning through Active Inference [62.997667081978825]
We show how ideas from active inference can augment traditional reinforcement learning approaches.
We develop and implement a novel objective for decision making, which we term the free energy of the expected future.
We demonstrate that the resulting algorithm successfully exploration and exploitation, simultaneously achieving robust performance on several challenging RL benchmarks with sparse, well-shaped, and no rewards.
arXiv Detail & Related papers (2020-02-28T10:28:21Z) - Risk-Aware Energy Scheduling for Edge Computing with Microgrid: A
Multi-Agent Deep Reinforcement Learning Approach [82.6692222294594]
We study a risk-aware energy scheduling problem for a microgrid-powered MEC network.
We derive the solution by applying a multi-agent deep reinforcement learning (MADRL)-based advantage actor-critic (A3C) algorithm with shared neural networks.
arXiv Detail & Related papers (2020-02-21T02:14:38Z) - Reward Tweaking: Maximizing the Total Reward While Planning for Short
Horizons [66.43848057122311]
Reward tweaking learns a surrogate reward function that induces optimal behavior on the original finite-horizon total reward task.
We show that reward tweaking guides the agent towards better long-horizon returns although it plans for short horizons.
arXiv Detail & Related papers (2020-02-09T09:50:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.