Related papers: One Solution is Not All You Need: Few-Shot Extrapolation via Structured MaxEnt RL

One Solution is Not All You Need: Few-Shot Extrapolation via Structured MaxEnt RL

URL: http://arxiv.org/abs/2010.14484v2
Date: Mon, 7 Dec 2020 22:33:16 GMT
Title: One Solution is Not All You Need: Few-Shot Extrapolation via Structured MaxEnt RL
Authors: Saurabh Kumar, Aviral Kumar, Sergey Levine, Chelsea Finn
Abstract summary: We show that learning diverse behaviors for accomplishing a task can lead to behavior that generalizes to varying environments. By identifying multiple solutions for the task in a single environment during training, our approach can generalize to new situations.
Score: 142.36621929739707
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While reinforcement learning algorithms can learn effective policies for complex tasks, these policies are often brittle to even minor task variations, especially when variations are not explicitly provided during training. One natural approach to this problem is to train agents with manually specified variation in the training task or environment. However, this may be infeasible in practical situations, either because making perturbations is not possible, or because it is unclear how to choose suitable perturbation strategies without sacrificing performance. The key insight of this work is that learning diverse behaviors for accomplishing a task can directly lead to behavior that generalizes to varying environments, without needing to perform explicit perturbations during training. By identifying multiple solutions for the task in a single environment during training, our approach can generalize to new situations by abandoning solutions that are no longer effective and adopting those that are. We theoretically characterize a robustness set of environments that arises from our algorithm and empirically find that our diversity-driven approach can extrapolate to various changes in the environment and task.

Related papers

No Regrets: Investigating and Improving Regret Approximations for Curriculum Discovery [53.08822154199948]
Unsupervised Environment Design (UED) methods have gained recent attention as their adaptive curricula promise to enable agents to be robust to in- and out-of-distribution tasks. This work investigates how existing UED methods select training environments, focusing on task prioritisation metrics. We develop a method that directly trains on scenarios with high learnability.
arXiv Detail & Related papers (2024-08-27T14:31:54Z)
Active Instruction Tuning: Improving Cross-Task Generalization by Training on Prompt Sensitive Tasks [101.40633115037983]
Instruction tuning (IT) achieves impressive zero-shot generalization results by training large language models (LLMs) on a massive amount of diverse tasks with instructions. How to select new tasks to improve the performance and generalizability of IT models remains an open question. We propose active instruction tuning based on prompt uncertainty, a novel framework to identify informative tasks, and then actively tune the models on the selected tasks.
arXiv Detail & Related papers (2023-11-01T04:40:05Z)
Diversity for Contingency: Learning Diverse Behaviors for Efficient Adaptation and Transfer [0.0]
We propose a simple method for discovering all possible solutions of a given task. Unlike prior methods, our approach does not require learning additional models for novelty detection.
arXiv Detail & Related papers (2023-10-11T13:39:35Z)
Stabilizing Unsupervised Environment Design with a Learned Adversary [28.426666219969555]
Key challenge in training generally-capable agents is the design of training tasks that facilitate broad generalization and robustness to environment variations. A pioneering approach for Unsupervised Environment Design (UED) is PAIRED, which uses reinforcement learning to train a teacher policy to design tasks from scratch. Despite its strong theoretical backing, PAIRED suffers from a variety of challenges that hinder its practical performance. We make it possible for PAIRED to match or exceed state-of-the-art methods, producing robust agents in several established challenging procedurally-generated environments.
arXiv Detail & Related papers (2023-08-21T15:42:56Z)
Intrinsically Motivated Hierarchical Policy Learning in Multi-objective Markov Decision Processes [15.50007257943931]
We propose a novel dual-phase intrinsically motivated reinforcement learning method to address this limitation. We show experimentally that the proposed method significantly outperforms state-of-the-art multi-objective reinforcement methods in a dynamic robotics environment.
arXiv Detail & Related papers (2023-08-18T02:10:45Z)
Discovering Diverse Solutions in Deep Reinforcement Learning [84.45686627019408]
Reinforcement learning algorithms are typically limited to learning a single solution of a specified task. We propose an RL method that can learn infinitely many solutions by training a policy conditioned on a continuous or discrete low-dimensional latent variable.
arXiv Detail & Related papers (2021-03-12T04:54:31Z)
Deep Reinforcement Learning amidst Lifelong Non-Stationarity [67.24635298387624]
We show that an off-policy RL algorithm can reason about and tackle lifelong non-stationarity. Our method leverages latent variable models to learn a representation of the environment from current and past experiences. We also introduce several simulation environments that exhibit lifelong non-stationarity, and empirically find that our approach substantially outperforms approaches that do not reason about environment shift.
arXiv Detail & Related papers (2020-06-18T17:34:50Z)
Learning Adaptive Exploration Strategies in Dynamic Environments Through Informed Policy Regularization [100.72335252255989]
We study the problem of learning exploration-exploitation strategies that effectively adapt to dynamic environments. We propose a novel algorithm that regularizes the training of an RNN-based policy using informed policies trained to maximize the reward in each task.
arXiv Detail & Related papers (2020-05-06T16:14:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.