Predictive Coding Enhances Meta-RL To Achieve Interpretable Bayes-Optimal Belief Representation Under Partial Observability
- URL: http://arxiv.org/abs/2510.22039v1
- Date: Fri, 24 Oct 2025 21:45:56 GMT
- Title: Predictive Coding Enhances Meta-RL To Achieve Interpretable Bayes-Optimal Belief Representation Under Partial Observability
- Authors: Po-Chen Kuo, Han Hou, Will Dabney, Edgar Y. Walker,
- Abstract summary: Learning a compact representation of history is critical for planning and generalization in partially observable environments.<n>We show that meta-reinforcement learning (RL) agents can attain near Bayes-optimal policies, but often fail to learn the compact, interpretable Bayes-optimal belief states.<n>We investigate whether integrating self-supervised predictive coding modules into meta-RL can facilitate learning of Bayes-optimal representations.
- Score: 10.548824172738227
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning a compact representation of history is critical for planning and generalization in partially observable environments. While meta-reinforcement learning (RL) agents can attain near Bayes-optimal policies, they often fail to learn the compact, interpretable Bayes-optimal belief states. This representational inefficiency potentially limits the agent's adaptability and generalization capacity. Inspired by predictive coding in neuroscience--which suggests that the brain predicts sensory inputs as a neural implementation of Bayesian inference--and by auxiliary predictive objectives in deep RL, we investigate whether integrating self-supervised predictive coding modules into meta-RL can facilitate learning of Bayes-optimal representations. Through state machine simulation, we show that meta-RL with predictive modules consistently generates more interpretable representations that better approximate Bayes-optimal belief states compared to conventional meta-RL across a wide variety of tasks, even when both achieve optimal policies. In challenging tasks requiring active information seeking, only meta-RL with predictive modules successfully learns optimal representations and policies, whereas conventional meta-RL struggles with inadequate representation learning. Finally, we demonstrate that better representation learning leads to improved generalization. Our results strongly suggest the role of predictive learning as a guiding principle for effective representation learning in agents navigating partial observability.
Related papers
- Benefits and Pitfalls of Reinforcement Learning for Language Model Planning: A Theoretical Perspective [52.38531288378491]
reinforcement learning (RL) methods have substantially enhanced the planning capabilities of Large Language Models (LLMs)<n>In this work, we investigate RL's benefits and limitations through a tractable graph-based abstraction.<n>Our theoretical analyses reveal that supervised fine-tuning (SFT) may introduce co-occurrence-based spurious solutions, whereas RL achieves correct planning primarily through exploration.
arXiv Detail & Related papers (2025-09-26T17:39:48Z) - Meta-Reinforcement Learning with Universal Policy Adaptation: Provable Near-Optimality under All-task Optimum Comparator [9.900800253949512]
We develop a bilevel optimization framework for meta-RL (BO-MRL) to learn the meta-prior for task-specific policy adaptation.
We empirically validate the correctness of the derived upper bounds and demonstrate the superior effectiveness of the proposed algorithm over benchmarks.
arXiv Detail & Related papers (2024-10-13T05:17:58Z) - ContraBAR: Contrastive Bayes-Adaptive Deep RL [22.649531458557206]
In meta reinforcement learning (meta RL), an agent seeks a Bayes-optimal policy -- the optimal policy when facing an unknown task.
We investigate whether contrastive methods can be used for learning Bayes-optimal behavior.
We propose a simple meta RL algorithm that uses contrastive predictive coding (CPC) in lieu of variational belief inference.
arXiv Detail & Related papers (2023-06-04T17:50:20Z) - Provable Reward-Agnostic Preference-Based Reinforcement Learning [61.39541986848391]
Preference-based Reinforcement Learning (PbRL) is a paradigm in which an RL agent learns to optimize a task using pair-wise preference-based feedback over trajectories.
We propose a theoretical reward-agnostic PbRL framework where exploratory trajectories that enable accurate learning of hidden reward functions are acquired.
arXiv Detail & Related papers (2023-05-29T15:00:09Z) - Scalable PAC-Bayesian Meta-Learning via the PAC-Optimal Hyper-Posterior:
From Theory to Practice [54.03076395748459]
A central question in the meta-learning literature is how to regularize to ensure generalization to unseen tasks.
We present a generalization bound for meta-learning, which was first derived by Rothfuss et al.
We provide a theoretical analysis and empirical case study under which conditions and to what extent these guarantees for meta-learning improve upon PAC-Bayesian per-task learning bounds.
arXiv Detail & Related papers (2022-11-14T08:51:04Z) - INFOrmation Prioritization through EmPOWERment in Visual Model-Based RL [90.06845886194235]
We propose a modified objective for model-based reinforcement learning (RL)
We integrate a term inspired by variational empowerment into a state-space model based on mutual information.
We evaluate the approach on a suite of vision-based robot control tasks with natural video backgrounds.
arXiv Detail & Related papers (2022-04-18T23:09:23Z) - Which Mutual-Information Representation Learning Objectives are
Sufficient for Control? [80.2534918595143]
Mutual information provides an appealing formalism for learning representations of data.
This paper formalizes the sufficiency of a state representation for learning and representing the optimal policy.
Surprisingly, we find that two of these objectives can yield insufficient representations given mild and common assumptions on the structure of the MDP.
arXiv Detail & Related papers (2021-06-14T10:12:34Z) - Variational Empowerment as Representation Learning for Goal-Based
Reinforcement Learning [114.07623388322048]
We discuss how the standard goal-conditioned RL (GCRL) is encapsulated by the objective variational empowerment.
Our work lays a novel foundation from which to evaluate, analyze, and develop representation learning techniques in goal-based RL.
arXiv Detail & Related papers (2021-06-02T18:12:26Z) - Meta-trained agents implement Bayes-optimal agents [13.572630988699572]
We show that memory-based meta-learning might serve as a technique for numerically approximating Bayes-optimal agents.
Our results suggest that memory-based meta-learning might serve as a general technique for numerically approximating Bayes-optimal agents.
arXiv Detail & Related papers (2020-10-21T18:05:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.