Related papers: Exploration Implies Data Augmentation: Reachability and Generalisation in Contextual MDPs

Exploration Implies Data Augmentation: Reachability and Generalisation in Contextual MDPs

URL: http://arxiv.org/abs/2410.03565v2
Date: Wed, 05 Mar 2025 10:47:17 GMT
Title: Exploration Implies Data Augmentation: Reachability and Generalisation in Contextual MDPs
Authors: Max Weltevrede, Caroline Horsch, Matthijs T. J. Spaan, Wendelin Böhmer,
Abstract summary: We show that training on more states can indeed improve generalisation, but can come at a cost of reducing the accuracy of the learned value function.<n>We propose a method Explore-Go that implements an exploration phase at the beginning of each episode.
Score: 5.855552389030083
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the zero-shot policy transfer (ZSPT) setting for contextual Markov decision processes (MDP), agents train on a fixed set of contexts and must generalise to new ones. Recent work has argued and demonstrated that increased exploration can improve this generalisation, by training on more states in the training contexts. In this paper, we demonstrate that training on more states can indeed improve generalisation, but can come at a cost of reducing the accuracy of the learned value function which should not benefit generalisation. We introduce reachability in the ZSPT setting to define which states/contexts require generalisation and explain why exploration can improve it. We hypothesise and demonstrate that using exploration to increase the agent's coverage while also increasing the accuracy improves generalisation even more. Inspired by this, we propose a method Explore-Go that implements an exploration phase at the beginning of each episode, which can be combined with existing on- and off-policy RL algorithms and significantly improves generalisation even in partially observable MDPs. We demonstrate the effectiveness of Explore-Go when combined with several popular algorithms and show an increase in generalisation performance across several environments. With this, we hope to provide practitioners with a simple modification that can improve the generalisation of their agents.

Related papers

Good Actions Succeed, Bad Actions Generalize: A Case Study on Why RL Generalizes Better [0.3021678014343889]
Supervised learning (SL) and reinforcement learning (RL) are widely used to train general-purpose agents for complex tasks. This paper provides a direct comparison between SL and RL in terms of zero-shot generalization.
arXiv Detail & Related papers (2025-03-19T21:03:27Z)
Towards Modality Generalization: A Benchmark and Prospective Analysis [56.84045461854789]
This paper introduces Modality Generalization (MG), which focuses on enabling models to generalize to unseen modalities. We propose a comprehensive benchmark featuring multi-modal algorithms and adapt existing methods that focus on generalization. Our work provides a foundation for advancing robust and adaptable multi-modal models, enabling them to handle unseen modalities in realistic scenarios.
arXiv Detail & Related papers (2024-12-24T08:38:35Z)
Doubly Mild Generalization for Offline Reinforcement Learning [50.084440946096]
We show that mild generalization beyond the dataset can be trusted and leveraged to improve performance under certain conditions. We propose Doubly Mild Generalization (DMG) comprising (i) mild action generalization and (ii) mild generalization propagation. DMG achieves state-of-the-art performance across Gym-MuJoCo tasks and challenging AntMaze tasks.
arXiv Detail & Related papers (2024-11-12T17:04:56Z)
Explore-Go: Leveraging Exploration for Generalisation in Deep Reinforcement Learning [5.624791703748109]
We show that increased exploration during training can be leveraged to increase the generalisation performance of the agent. We propose a novel method Explore-Go that exploits this intuition by increasing the number of states on which the agent trains.
arXiv Detail & Related papers (2024-06-12T10:39:31Z)
The Role of Diverse Replay for Generalisation in Reinforcement Learning [7.399291598113285]
We investigate the impact of the exploration strategy and replay buffer on generalisation in reinforcement learning. We show that collecting and training on more diverse data from the training environments will improve zero-shot generalisation to new tasks.
arXiv Detail & Related papers (2023-06-09T07:48:36Z)
On the Importance of Exploration for Generalization in Reinforcement Learning [89.63074327328765]
We propose EDE: Exploration via Distributional Ensemble, a method that encourages exploration of states with high uncertainty. Our algorithm is the first value-based approach to achieve state-of-the-art on both Procgen and Crafter.
arXiv Detail & Related papers (2023-06-08T18:07:02Z)
DART: Diversify-Aggregate-Repeat Training Improves Generalization of Neural Networks [39.69378006723682]
Generalization of neural networks is crucial for deploying them safely in the real world. In this work, we first establish a surprisingly simple but strong benchmark for generalization which utilizes diverse augmentations within a training minibatch. We then propose Diversify-Aggregate-Repeat Training (DART) strategy that first trains diverse models using different augmentations (or domains) to explore the loss basin. We find that Repeating the step of aggregation throughout training improves the overall optimization trajectory and also ensures that the individual models have a sufficiently low loss barrier to obtain improved generalization on combining them.
arXiv Detail & Related papers (2023-02-28T15:54:47Z)
On the Effectiveness of Fine-tuning Versus Meta-reinforcement Learning [71.55412580325743]
We show that multi-task pretraining with fine-tuning on new tasks performs equally as well, or better, than meta-pretraining with meta test-time adaptation. This is encouraging for future research, as multi-task pretraining tends to be simpler and computationally cheaper than meta-RL.
arXiv Detail & Related papers (2022-06-07T13:24:00Z)
Generalizing to New Tasks via One-Shot Compositional Subgoals [23.15624959305799]
The ability to generalize to previously unseen tasks with little to no supervision is a key challenge in modern machine learning research. We introduce CASE which attempts to address these issues by training an Imitation Learning agent using adaptive "near future" subgoals. Our experiments show that the proposed approach consistently outperforms the previous state-of-the-art compositional Imitation Learning approach by 30%.
arXiv Detail & Related papers (2022-05-16T14:30:11Z)
Contextualize Me -- The Case for Context in Reinforcement Learning [49.794253971446416]
Contextual Reinforcement Learning (cRL) provides a framework to model such changes in a principled manner. We show how cRL contributes to improving zero-shot generalization in RL through meaningful benchmarks and structured reasoning about generalization tasks.
arXiv Detail & Related papers (2022-02-09T15:01:59Z)
CoMPS: Continual Meta Policy Search [113.33157585319906]
We develop a new continual meta-learning method to address challenges in sequential multi-task learning. We find that CoMPS outperforms prior continual learning and off-policy meta-reinforcement methods on several sequences of challenging continuous control tasks.
arXiv Detail & Related papers (2021-12-08T18:53:08Z)
Towards More Generalizable One-shot Visual Imitation Learning [81.09074706236858]
A general-purpose robot should be able to master a wide range of tasks and quickly learn a novel one by leveraging past experiences. One-shot imitation learning (OSIL) approaches this goal by training an agent with (pairs of) expert demonstrations. We push for a higher level of generalization ability by investigating a more ambitious multi-task setup.
arXiv Detail & Related papers (2021-10-26T05:49:46Z)
Hierarchical Skills for Efficient Exploration [70.62309286348057]
In reinforcement learning, pre-trained low-level skills have the potential to greatly facilitate exploration. Prior knowledge of the downstream task is required to strike the right balance between generality (fine-grained control) and specificity (faster learning) in skill design. We propose a hierarchical skill learning framework that acquires skills of varying complexity in an unsupervised manner.
arXiv Detail & Related papers (2021-10-20T22:29:32Z)
Why Generalization in RL is Difficult: Epistemic POMDPs and Implicit Partial Observability [92.95794652625496]
Generalization is a central challenge for the deployment of reinforcement learning systems. We show that generalization to unseen test conditions from a limited number of training conditions induces implicit partial observability. We recast the problem of generalization in RL as solving the induced partially observed Markov decision process.
arXiv Detail & Related papers (2021-07-13T17:59:25Z)
Generalization of Reinforcement Learning with Policy-Aware Adversarial Data Augmentation [32.70482982044965]
We propose a novel policy-aware adversarial data augmentation method to augment the standard policy learning method with automatically generated trajectory data. We conduct experiments on a number of RL tasks to investigate the generalization performance of the proposed method. The results show our method can generalize well with limited training diversity, and achieve the state-of-the-art generalization test performance.
arXiv Detail & Related papers (2021-06-29T17:21:59Z)
Predicting Deep Neural Network Generalization with Perturbation Response Curves [58.8755389068888]
We propose a new framework for evaluating the generalization capabilities of trained networks. Specifically, we introduce two new measures for accurately predicting generalization gaps. We attain better predictive scores than the current state-of-the-art measures on a majority of tasks in the Predicting Generalization in Deep Learning (PGDL) NeurIPS 2020 competition.
arXiv Detail & Related papers (2021-06-09T01:37:36Z)
Parrot: Data-Driven Behavioral Priors for Reinforcement Learning [79.32403825036792]
We propose a method for pre-training behavioral priors that can capture complex input-output relationships observed in successful trials. We show how this learned prior can be used for rapidly learning new tasks without impeding the RL agent's ability to try out novel behaviors.
arXiv Detail & Related papers (2020-11-19T18:47:40Z)
Improving Generalization in Reinforcement Learning with Mixture Regularization [113.12412071717078]
We introduce a simple approach, named mixreg, which trains agents on a mixture of observations from different training environments. Mixreg increases the data diversity more effectively and helps learn smoother policies. Results show mixreg outperforms the well-established baselines on unseen testing environments by a large margin.
arXiv Detail & Related papers (2020-10-21T08:12:03Z)
Dynamics Generalization via Information Bottleneck in Deep Reinforcement Learning [90.93035276307239]
We propose an information theoretic regularization objective and an annealing-based optimization method to achieve better generalization ability in RL agents. We demonstrate the extreme generalization benefits of our approach in different domains ranging from maze navigation to robotic tasks. This work provides a principled way to improve generalization in RL by gradually removing information that is redundant for task-solving.
arXiv Detail & Related papers (2020-08-03T02:24:20Z)
Planning to Explore via Self-Supervised World Models [120.31359262226758]
Plan2Explore is a self-supervised reinforcement learning agent. We present a new approach to self-supervised exploration and fast adaptation to new tasks. Without any training supervision or task-specific interaction, Plan2Explore outperforms prior self-supervised exploration methods.
arXiv Detail & Related papers (2020-05-12T17:59:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.