Related papers: Zero-Shot Context Generalization in Reinforcement Learning from Few Training Contexts

Zero-Shot Context Generalization in Reinforcement Learning from Few Training Contexts

URL: http://arxiv.org/abs/2507.07348v1
Date: Thu, 10 Jul 2025 00:23:13 GMT
Title: Zero-Shot Context Generalization in Reinforcement Learning from Few Training Contexts
Authors: James Chapman, Kedar Karhadkar, Guido Montufar,
Abstract summary: We introduce context-enhanced Bellman equation (CEBE) to improve generalization when training on a single context.<n>We derive context sample enhancement (CSE) as an efficient data augmentation method for approximating the CEBE in deterministic control environments.
Score: 1.5020330976600738
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep reinforcement learning (DRL) has achieved remarkable success across multiple domains, including competitive games, natural language processing, and robotics. Despite these advancements, policies trained via DRL often struggle to generalize to evaluation environments with different parameters. This challenge is typically addressed by training with multiple contexts and/or by leveraging additional structure in the problem. However, obtaining sufficient training data across diverse contexts can be impractical in real-world applications. In this work, we consider contextual Markov decision processes (CMDPs) with transition and reward functions that exhibit regularity in context parameters. We introduce the context-enhanced Bellman equation (CEBE) to improve generalization when training on a single context. We prove both analytically and empirically that the CEBE yields a first-order approximation to the Q-function trained across multiple contexts. We then derive context sample enhancement (CSE) as an efficient data augmentation method for approximating the CEBE in deterministic control environments. We numerically validate the performance of CSE in simulation environments, showcasing its potential to improve generalization in DRL.

Related papers

Scalable In-Context Q-Learning [42.80296905313835]
We propose textbfScalable textbfIn-textbfContext textbfQ-textbfLearning (textbfSICQL) to steer in-context reinforcement learning.<n>textbfSICQL harnesses dynamic programming and world modeling to steer ICRL toward efficient reward and task generalization.
arXiv Detail & Related papers (2025-06-02T04:21:56Z)
A Controlled Study on Long Context Extension and Generalization in LLMs [85.4758128256142]
Broad textual understanding and in-context learning require language models that utilize full document contexts. Due to the implementation challenges associated with directly training long-context models, many methods have been proposed for extending models to handle long contexts. We implement a controlled protocol for extension methods with a standardized evaluation, utilizing consistent base models and extension data.
arXiv Detail & Related papers (2024-09-18T17:53:17Z)
Inferring Behavior-Specific Context Improves Zero-Shot Generalization in Reinforcement Learning [4.902544998453533]
We argue that understanding and utilizing contextual cues, such as the gravity level of the environment, is critical for robust generalization. Our algorithm demonstrates improved generalization on various simulated domains, outperforming prior context-learning techniques in zero-shot settings.
arXiv Detail & Related papers (2024-04-15T07:31:48Z)
Acquiring Diverse Skills using Curriculum Reinforcement Learning with Mixture of Experts [58.220879689376744]
Reinforcement learning (RL) is a powerful approach for acquiring a good-performing policy. We propose textbfDiverse textbfSkill textbfLearning (Di-SkilL) for learning diverse skills. We show on challenging robot simulation tasks that Di-SkilL can learn diverse and performant skills.
arXiv Detail & Related papers (2024-03-11T17:49:18Z)
Dynamics-Adaptive Continual Reinforcement Learning via Progressive Contextualization [29.61829620717385]
Key challenge of continual reinforcement learning (CRL) in dynamic environments is to promptly adapt the RL agent's behavior as the environment changes over its lifetime. DaCoRL learns a context-conditioned policy using progressive contextualization. DaCoRL features consistent superiority over existing methods in terms of the stability, overall performance and generalization ability.
arXiv Detail & Related papers (2022-09-01T10:26:58Z)
AACC: Asymmetric Actor-Critic in Contextual Reinforcement Learning [13.167123175701802]
This paper formalizes the task of adapting to changing environmental dynamics in Reinforcement Learning (RL) We then propose the Asymmetric Actor-Critic in Contextual RL (AACC) as an end-to-end actor-critic method to deal with such generalization tasks. We demonstrate the essential improvements in the performance of AACC over existing baselines experimentally in a range of simulated environments.
arXiv Detail & Related papers (2022-08-03T22:52:26Z)
Contextualize Me -- The Case for Context in Reinforcement Learning [49.794253971446416]
Contextual Reinforcement Learning (cRL) provides a framework to model such changes in a principled manner. We show how cRL contributes to improving zero-shot generalization in RL through meaningful benchmarks and structured reasoning about generalization tasks.
arXiv Detail & Related papers (2022-02-09T15:01:59Z)
Policy Information Capacity: Information-Theoretic Measure for Task Complexity in Deep Reinforcement Learning [83.66080019570461]
We propose two environment-agnostic, algorithm-agnostic quantitative metrics for task difficulty. We show that these metrics have higher correlations with normalized task solvability scores than a variety of alternatives. These metrics can also be used for fast and compute-efficient optimizations of key design parameters.
arXiv Detail & Related papers (2021-03-23T17:49:50Z)
Instance based Generalization in Reinforcement Learning [24.485597364200824]
We analyze policy learning in the context of Partially Observable Markov Decision Processes (POMDPs) We prove that, independently of the exploration strategy, reusing instances introduces significant changes on the effective Markov dynamics the agent observes during training. We propose training a shared belief representation over an ensemble of specialized policies, from which we compute a consensus policy that is used for data collection, disallowing instance specific exploitation.
arXiv Detail & Related papers (2020-11-02T16:19:44Z)
Dynamics Generalization via Information Bottleneck in Deep Reinforcement Learning [90.93035276307239]
We propose an information theoretic regularization objective and an annealing-based optimization method to achieve better generalization ability in RL agents. We demonstrate the extreme generalization benefits of our approach in different domains ranging from maze navigation to robotic tasks. This work provides a principled way to improve generalization in RL by gradually removing information that is redundant for task-solving.
arXiv Detail & Related papers (2020-08-03T02:24:20Z)
Unshuffling Data for Improved Generalization [65.57124325257409]
Generalization beyond the training distribution is a core challenge in machine learning. We show that partitioning the data into well-chosen, non-i.i.d. subsets treated as multiple training environments can guide the learning of models with better out-of-distribution generalization.
arXiv Detail & Related papers (2020-02-27T03:07:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.