Instance based Generalization in Reinforcement Learning
- URL: http://arxiv.org/abs/2011.01089v1
- Date: Mon, 2 Nov 2020 16:19:44 GMT
- Title: Instance based Generalization in Reinforcement Learning
- Authors: Martin Bertran, Natalia Martinez, Mariano Phielipp, Guillermo Sapiro
- Abstract summary: We analyze policy learning in the context of Partially Observable Markov Decision Processes (POMDPs)
We prove that, independently of the exploration strategy, reusing instances introduces significant changes on the effective Markov dynamics the agent observes during training.
We propose training a shared belief representation over an ensemble of specialized policies, from which we compute a consensus policy that is used for data collection, disallowing instance specific exploitation.
- Score: 24.485597364200824
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Agents trained via deep reinforcement learning (RL) routinely fail to
generalize to unseen environments, even when these share the same underlying
dynamics as the training levels. Understanding the generalization properties of
RL is one of the challenges of modern machine learning. Towards this goal, we
analyze policy learning in the context of Partially Observable Markov Decision
Processes (POMDPs) and formalize the dynamics of training levels as instances.
We prove that, independently of the exploration strategy, reusing instances
introduces significant changes on the effective Markov dynamics the agent
observes during training. Maximizing expected rewards impacts the learned
belief state of the agent by inducing undesired instance specific speedrunning
policies instead of generalizeable ones, which are suboptimal on the training
set. We provide generalization bounds to the value gap in train and test
environments based on the number of training instances, and use insights based
on these to improve performance on unseen levels. We propose training a shared
belief representation over an ensemble of specialized policies, from which we
compute a consensus policy that is used for data collection, disallowing
instance specific exploitation. We experimentally validate our theory,
observations, and the proposed computational solution over the CoinRun
benchmark.
Related papers
- Instance Selection for Dynamic Algorithm Configuration with Reinforcement Learning: Improving Generalization [16.49696895887536]
Dynamic Algorithm configuration (DAC) addresses the challenge of dynamically setting hyperparameters of an algorithm for a diverse set of instances.
Agents trained with Deep Reinforcement Learning (RL) offer a pathway to solve such settings.
We take a step towards mitigating this by selecting a representative subset of training instances to overcome overrepresentation and then retraining the agent on this subset to improve its generalization performance.
arXiv Detail & Related papers (2024-07-18T13:44:43Z) - Assessing the Impact of Distribution Shift on Reinforcement Learning
Performance [0.0]
Reinforcement learning (RL) faces its own set of unique challenges.
Comparison of point estimates, and plots that show successful convergence to the optimal policy during training, may obfuscate overfitting or dependence on the experimental setup.
We propose a set of evaluation methods that measure the robustness of RL algorithms under distribution shifts.
arXiv Detail & Related papers (2024-02-05T23:50:55Z) - Invariant Causal Imitation Learning for Generalizable Policies [87.51882102248395]
We propose Invariant Causal Learning (ICIL) to learn an imitation policy.
ICIL learns a representation of causal features that is disentangled from the specific representations of noise variables.
We show that ICIL is effective in learning imitation policies capable of generalizing to unseen environments.
arXiv Detail & Related papers (2023-11-02T16:52:36Z) - Quantifying Agent Interaction in Multi-agent Reinforcement Learning for
Cost-efficient Generalization [63.554226552130054]
Generalization poses a significant challenge in Multi-agent Reinforcement Learning (MARL)
The extent to which an agent is influenced by unseen co-players depends on the agent's policy and the specific scenario.
We present the Level of Influence (LoI), a metric quantifying the interaction intensity among agents within a given scenario and environment.
arXiv Detail & Related papers (2023-10-11T06:09:26Z) - The Role of Diverse Replay for Generalisation in Reinforcement Learning [7.399291598113285]
We investigate the impact of the exploration strategy and replay buffer on generalisation in reinforcement learning.
We show that collecting and training on more diverse data from the training environments will improve zero-shot generalisation to new tasks.
arXiv Detail & Related papers (2023-06-09T07:48:36Z) - On the Importance of Exploration for Generalization in Reinforcement
Learning [89.63074327328765]
We propose EDE: Exploration via Distributional Ensemble, a method that encourages exploration of states with high uncertainty.
Our algorithm is the first value-based approach to achieve state-of-the-art on both Procgen and Crafter.
arXiv Detail & Related papers (2023-06-08T18:07:02Z) - Generalization Across Observation Shifts in Reinforcement Learning [13.136140831757189]
We extend the bisimulation framework to account for context dependent observation shifts.
Specifically, we focus on the simulator based learning setting and use alternate observations to learn a representation space.
This allows us to deploy the agent to varying observation settings during test time and generalize to unseen scenarios.
arXiv Detail & Related papers (2023-06-07T16:49:03Z) - Learning Dynamics and Generalization in Reinforcement Learning [59.530058000689884]
We show theoretically that temporal difference learning encourages agents to fit non-smooth components of the value function early in training.
We show that neural networks trained using temporal difference algorithms on dense reward tasks exhibit weaker generalization between states than randomly networks and gradient networks trained with policy methods.
arXiv Detail & Related papers (2022-06-05T08:49:16Z) - Dynamics Generalization via Information Bottleneck in Deep Reinforcement
Learning [90.93035276307239]
We propose an information theoretic regularization objective and an annealing-based optimization method to achieve better generalization ability in RL agents.
We demonstrate the extreme generalization benefits of our approach in different domains ranging from maze navigation to robotic tasks.
This work provides a principled way to improve generalization in RL by gradually removing information that is redundant for task-solving.
arXiv Detail & Related papers (2020-08-03T02:24:20Z) - Learning Adaptive Exploration Strategies in Dynamic Environments Through
Informed Policy Regularization [100.72335252255989]
We study the problem of learning exploration-exploitation strategies that effectively adapt to dynamic environments.
We propose a novel algorithm that regularizes the training of an RNN-based policy using informed policies trained to maximize the reward in each task.
arXiv Detail & Related papers (2020-05-06T16:14:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.