Information-Theoretic Policy Pre-Training with Empowerment
- URL: http://arxiv.org/abs/2510.05996v1
- Date: Tue, 07 Oct 2025 14:57:58 GMT
- Title: Information-Theoretic Policy Pre-Training with Empowerment
- Authors: Moritz Schneider, Robert Krug, Narunas Vaskevicius, Luigi Palmieri, Michael Volpp, Joschka Boedecker,
- Abstract summary: We show that empowerment can be used as a pre-training signal for data-efficient downstream task adaptation.<n>We propose a novel pre-training paradigm that initializes policies to maximize discounted empowerment.<n>Our findings pave the way for future research to scale this framework to high-dimensional and complex tasks.
- Score: 11.104497840016414
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Empowerment, an information-theoretic measure of an agent's potential influence on its environment, has emerged as a powerful intrinsic motivation and exploration framework for reinforcement learning (RL). Besides for unsupervised RL and skill learning algorithms, the specific use of empowerment as a pre-training signal has received limited attention in the literature. We show that empowerment can be used as a pre-training signal for data-efficient downstream task adaptation. For this we extend the traditional notion of empowerment by introducing discounted empowerment, which balances the agent's control over the environment across short- and long-term horizons. Leveraging this formulation, we propose a novel pre-training paradigm that initializes policies to maximize discounted empowerment, enabling agents to acquire a robust understanding of environmental dynamics. We analyze empowerment-based pre-training for various existing RL algorithms and empirically demonstrate its potential as a general-purpose initialization strategy: empowerment-maximizing policies with long horizons are data-efficient and effective, leading to improved adaptability in downstream tasks. Our findings pave the way for future research to scale this framework to high-dimensional and complex tasks, further advancing the field of RL.
Related papers
- Curriculum Design for Trajectory-Constrained Agent: Compressing Chain-of-Thought Tokens in LLMs [26.165537937650413]
Training agents to operate under strict constraints during deployment presents significant challenges.<n>We propose a curriculum learning strategy that gradually tightens constraints during training, enabling the agent to incrementally master the deployment requirements.
arXiv Detail & Related papers (2025-11-04T16:14:56Z) - Retaining by Doing: The Role of On-Policy Data in Mitigating Forgetting [40.80967570661867]
Adapting language models to new tasks via post-training carries the risk of degrading existing capabilities.<n>We compare the forgetting patterns of two widely adopted post-training methods: supervised fine-tuning (SFT) and reinforcement learning (RL)<n>RL leads to less forgetting than SFT while achieving comparable or higher target task performance.
arXiv Detail & Related papers (2025-10-21T17:59:41Z) - TD-JEPA: Latent-predictive Representations for Zero-Shot Reinforcement Learning [63.73629127832652]
We introduce TD-JEPA, which leverages TD-based latent-predictive representations into unsupervised RL.<n> TD-JEPA trains explicit state and task encoders, a policy-conditioned multi-step predictor, and a set of parameterized policies directly in latent space.<n> Empirically, TD-JEPA matches or outperforms state-of-the-art baselines on locomotion, navigation, and manipulation tasks across 13 datasets.
arXiv Detail & Related papers (2025-10-01T10:21:18Z) - Reinforcement Learning on Pre-Training Data [55.570379963147424]
We introduce Reinforcement Learning on Pre-Training data (R), a new training-time scaling paradigm for optimizing large language models (LLMs)<n>R enables the policy to autonomously explore meaningful trajectories to learn from pre-training data and improve its capability through reinforcement learning (RL)<n>Extensive experiments on both general-domain and mathematical reasoning benchmarks across multiple models validate the effectiveness of R.
arXiv Detail & Related papers (2025-09-23T17:10:40Z) - UAS Visual Navigation in Large and Unseen Environments via a Meta Agent [0.13654846342364302]
We propose a meta-curriculum training scheme to efficiently learn to navigate in large-scale urban environments.<n>We organize the training curriculum in a hierarchical manner such that the agent is guided from coarse to fine towards the target task.<n>In contrast to traditional reinforcement learning (RL), which focuses on acquiring a policy for a specific task, MRL aims to learn a policy with fast transfer ability to novel tasks.
arXiv Detail & Related papers (2025-03-20T01:44:59Z) - Adaptive Rank, Reduced Forgetting: Knowledge Retention in Continual Learning Vision-Language Models with Dynamic Rank-Selective LoRA [19.982853959240497]
Pre-trained vision-language embedding models such as CLIP have been widely adopted and validated in Continual Learning (CL)<n>Existing CL methods primarily focus on continual downstream adaptation using components isolated from the pre-trained model (PTM)<n>We propose a universal and efficient CL approach for CLIP based on Dynamic Rank-Selective LoRA (CoDyRA)
arXiv Detail & Related papers (2024-12-01T23:41:42Z) - From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.<n>We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z) - Latent-Predictive Empowerment: Measuring Empowerment without a Simulator [56.53777237504011]
We present Latent-Predictive Empowerment (LPE), an algorithm that can compute empowerment in a more practical manner.
LPE learns large skillsets by maximizing an objective that is a principled replacement for the mutual information between skills and states.
arXiv Detail & Related papers (2024-10-15T00:41:18Z) - On the Power of Pre-training for Generalization in RL: Provable Benefits
and Hardness [47.09873295916592]
Generalization in Reinforcement Learning (RL) aims to learn an agent during training that generalizes to the target environment.
This paper studies RL generalization from a theoretical aspect: how much can we expect pre-training over training environments to be helpful?
When the interaction with the target environment is not allowed, we certify that the best we can obtain is a near-optimal policy in an average sense, and we design an algorithm that achieves this goal.
arXiv Detail & Related papers (2022-10-19T10:58:24Z) - Variational Empowerment as Representation Learning for Goal-Based
Reinforcement Learning [114.07623388322048]
We discuss how the standard goal-conditioned RL (GCRL) is encapsulated by the objective variational empowerment.
Our work lays a novel foundation from which to evaluate, analyze, and develop representation learning techniques in goal-based RL.
arXiv Detail & Related papers (2021-06-02T18:12:26Z) - Dynamics Generalization via Information Bottleneck in Deep Reinforcement
Learning [90.93035276307239]
We propose an information theoretic regularization objective and an annealing-based optimization method to achieve better generalization ability in RL agents.
We demonstrate the extreme generalization benefits of our approach in different domains ranging from maze navigation to robotic tasks.
This work provides a principled way to improve generalization in RL by gradually removing information that is redundant for task-solving.
arXiv Detail & Related papers (2020-08-03T02:24:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.