Related papers: Learning Adaptive Exploration Strategies in Dynamic Environments Through Informed Policy Regularization

Learning Adaptive Exploration Strategies in Dynamic Environments Through Informed Policy Regularization

URL: http://arxiv.org/abs/2005.02934v1
Date: Wed, 6 May 2020 16:14:48 GMT
Title: Learning Adaptive Exploration Strategies in Dynamic Environments Through Informed Policy Regularization
Authors: Pierre-Alexandre Kamienny, Matteo Pirotta, Alessandro Lazaric, Thibault Lavril, Nicolas Usunier, Ludovic Denoyer
Abstract summary: We study the problem of learning exploration-exploitation strategies that effectively adapt to dynamic environments. We propose a novel algorithm that regularizes the training of an RNN-based policy using informed policies trained to maximize the reward in each task.
Score: 100.72335252255989
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the problem of learning exploration-exploitation strategies that effectively adapt to dynamic environments, where the task may change over time. While RNN-based policies could in principle represent such strategies, in practice their training time is prohibitive and the learning process often converges to poor solutions. In this paper, we consider the case where the agent has access to a description of the task (e.g., a task id or task parameters) at training time, but not at test time. We propose a novel algorithm that regularizes the training of an RNN-based policy using informed policies trained to maximize the reward in each task. This dramatically reduces the sample complexity of training RNN-based policies, without losing their representational power. As a result, our method learns exploration strategies that efficiently balance between gathering information about the unknown and changing task and maximizing the reward over time. We test the performance of our algorithm in a variety of environments where tasks may vary within each episode.

Related papers

Learning Policy Committees for Effective Personalization in MDPs with Diverse Tasks [40.2989900672992]
We propose a novel approach for learning a policy committee that includes at least one near-optimal policy with high probability for tasks encountered during execution. Our experiments on MuJoCo and Meta-World show that the proposed approach outperforms state-of-the-art multi-task, meta-, and task clustering baselines in training, generalization, and few-shot learning.
arXiv Detail & Related papers (2025-02-26T22:45:25Z)
Active Fine-Tuning of Generalist Policies [54.65568433408307]
We propose AMF (Active Multi-task Fine-tuning) to maximize multi-task policy performance under a limited demonstration budget. We derive performance guarantees for AMF under regularity assumptions and demonstrate its empirical effectiveness in complex and high-dimensional environments.
arXiv Detail & Related papers (2024-10-07T13:26:36Z)
Pretraining Decision Transformers with Reward Prediction for In-Context Multi-task Structured Bandit Learning [12.608461657195367]
We study multi-task structured bandit problem where the goal is to learn a near-optimal algorithm that minimizes cumulative regret. We use a transformer as a decision-making algorithm to learn this shared structure so as to generalize to the test task. We show that our algorithm, without the knowledge of the underlying problem structure, can learn a near-optimal policy in-context.
arXiv Detail & Related papers (2024-06-07T16:34:31Z)
Curriculum Learning in Job Shop Scheduling using Reinforcement Learning [0.3867363075280544]
Deep Reinforcement Learning (DRL) dynamically adjusts an agent's planning strategy in response to difficult instances. We further improve DLR as an underlying method by actively incorporating the variability of difficulty within the same problem size into the design of the learning process.
arXiv Detail & Related papers (2023-05-17T13:15:27Z)
Policy Information Capacity: Information-Theoretic Measure for Task Complexity in Deep Reinforcement Learning [83.66080019570461]
We propose two environment-agnostic, algorithm-agnostic quantitative metrics for task difficulty. We show that these metrics have higher correlations with normalized task solvability scores than a variety of alternatives. These metrics can also be used for fast and compute-efficient optimizations of key design parameters.
arXiv Detail & Related papers (2021-03-23T17:49:50Z)
One Solution is Not All You Need: Few-Shot Extrapolation via Structured MaxEnt RL [142.36621929739707]
We show that learning diverse behaviors for accomplishing a task can lead to behavior that generalizes to varying environments. By identifying multiple solutions for the task in a single environment during training, our approach can generalize to new situations.
arXiv Detail & Related papers (2020-10-27T17:41:57Z)
Deep Reinforcement Learning amidst Lifelong Non-Stationarity [67.24635298387624]
We show that an off-policy RL algorithm can reason about and tackle lifelong non-stationarity. Our method leverages latent variable models to learn a representation of the environment from current and past experiences. We also introduce several simulation environments that exhibit lifelong non-stationarity, and empirically find that our approach substantially outperforms approaches that do not reason about environment shift.
arXiv Detail & Related papers (2020-06-18T17:34:50Z)
Meta-Reinforcement Learning Robust to Distributional Shift via Model Identification and Experience Relabeling [126.69933134648541]
We present a meta-reinforcement learning algorithm that is both efficient and extrapolates well when faced with out-of-distribution tasks at test time. Our method is based on a simple insight: we recognize that dynamics models can be adapted efficiently and consistently with off-policy data.
arXiv Detail & Related papers (2020-06-12T13:34:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.