Unsupervised Reinforcement Learning in Multiple Environments
- URL: http://arxiv.org/abs/2112.08746v1
- Date: Thu, 16 Dec 2021 09:54:37 GMT
- Title: Unsupervised Reinforcement Learning in Multiple Environments
- Authors: Mirco Mutti, Mattia Mancassola, Marcello Restelli
- Abstract summary: We address the problem of unsupervised reinforcement learning in a class of multiple environments.
We present a policy gradient algorithm, $alpha$MEPOL, to optimize the introduced objective through mediated interactions with the class.
We show that reinforcement learning greatly benefits from the pre-trained exploration strategy.
- Score: 37.5349071806395
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Several recent works have been dedicated to unsupervised reinforcement
learning in a single environment, in which a policy is first pre-trained with
unsupervised interactions, and then fine-tuned towards the optimal policy for
several downstream supervised tasks defined over the same environment. Along
this line, we address the problem of unsupervised reinforcement learning in a
class of multiple environments, in which the policy is pre-trained with
interactions from the whole class, and then fine-tuned for several tasks in any
environment of the class. Notably, the problem is inherently multi-objective as
we can trade off the pre-training objective between environments in many ways.
In this work, we foster an exploration strategy that is sensitive to the most
adverse cases within the class. Hence, we cast the exploration problem as the
maximization of the mean of a critical percentile of the state visitation
entropy induced by the exploration strategy over the class of environments.
Then, we present a policy gradient algorithm, $\alpha$MEPOL, to optimize the
introduced objective through mediated interactions with the class. Finally, we
empirically demonstrate the ability of the algorithm in learning to explore
challenging classes of continuous environments and we show that reinforcement
learning greatly benefits from the pre-trained exploration strategy w.r.t.
learning from scratch.
Related papers
- Intrinsically Motivated Hierarchical Policy Learning in Multi-objective
Markov Decision Processes [15.50007257943931]
We propose a novel dual-phase intrinsically motivated reinforcement learning method to address this limitation.
We show experimentally that the proposed method significantly outperforms state-of-the-art multi-objective reinforcement methods in a dynamic robotics environment.
arXiv Detail & Related papers (2023-08-18T02:10:45Z) - Open-World Multi-Task Control Through Goal-Aware Representation Learning
and Adaptive Horizon Prediction [29.32859058651654]
We study the problem of learning goal-conditioned policies in Minecraft, a popular, widely accessible yet challenging open-ended environment.
We first identify two main challenges of learning such policies: 1) the indistinguishability of tasks from the state distribution, due to the vast scene diversity, and 2) the non-stationary nature of environment dynamics caused by partial observability.
We propose Goal-Sensitive Backbone (GSB) for the policy to encourage the emergence of goal-relevant visual state representations.
arXiv Detail & Related papers (2023-01-21T08:15:38Z) - Exploration via Planning for Information about the Optimal Trajectory [67.33886176127578]
We develop a method that allows us to plan for exploration while taking the task and the current knowledge into account.
We demonstrate that our method learns strong policies with 2x fewer samples than strong exploration baselines.
arXiv Detail & Related papers (2022-10-06T20:28:55Z) - Fast Model-based Policy Search for Universal Policy Networks [45.44896435487879]
Adapting an agent's behaviour to new environments has been one of the primary focus areas of physics based reinforcement learning.
We propose a Gaussian Process-based prior learned in simulation, that captures the likely performance of a policy when transferred to a previously unseen environment.
We integrate this prior with a Bayesian optimisation-based policy search process to improve the efficiency of identifying the most appropriate policy from the universal policy network.
arXiv Detail & Related papers (2022-02-11T18:08:02Z) - Q-Mixing Network for Multi-Agent Pathfinding in Partially Observable
Grid Environments [62.997667081978825]
We consider the problem of multi-agent navigation in partially observable grid environments.
We suggest utilizing the reinforcement learning approach when the agents, first, learn the policies that map observations to actions and then follow these policies to reach their goals.
arXiv Detail & Related papers (2021-08-13T09:44:47Z) - Stay Alive with Many Options: A Reinforcement Learning Approach for
Autonomous Navigation [5.811502603310248]
We introduce an alternative approach to sequentially learn such skills without using an overarching hierarchical policy.
We demonstrate the utility of our approach in a simulated 3D navigation environment which we have built.
arXiv Detail & Related papers (2021-01-30T06:55:35Z) - Importance Weighted Policy Learning and Adaptation [89.46467771037054]
We study a complementary approach which is conceptually simple, general, modular and built on top of recent improvements in off-policy learning.
The framework is inspired by ideas from the probabilistic inference literature and combines robust off-policy learning with a behavior prior.
Our approach achieves competitive adaptation performance on hold-out tasks compared to meta reinforcement learning baselines and can scale to complex sparse-reward scenarios.
arXiv Detail & Related papers (2020-09-10T14:16:58Z) - Deep Reinforcement Learning amidst Lifelong Non-Stationarity [67.24635298387624]
We show that an off-policy RL algorithm can reason about and tackle lifelong non-stationarity.
Our method leverages latent variable models to learn a representation of the environment from current and past experiences.
We also introduce several simulation environments that exhibit lifelong non-stationarity, and empirically find that our approach substantially outperforms approaches that do not reason about environment shift.
arXiv Detail & Related papers (2020-06-18T17:34:50Z) - Learning Adaptive Exploration Strategies in Dynamic Environments Through
Informed Policy Regularization [100.72335252255989]
We study the problem of learning exploration-exploitation strategies that effectively adapt to dynamic environments.
We propose a novel algorithm that regularizes the training of an RNN-based policy using informed policies trained to maximize the reward in each task.
arXiv Detail & Related papers (2020-05-06T16:14:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.