Self-Paced Context Evaluation for Contextual Reinforcement Learning
- URL: http://arxiv.org/abs/2106.05110v1
- Date: Wed, 9 Jun 2021 14:39:09 GMT
- Title: Self-Paced Context Evaluation for Contextual Reinforcement Learning
- Authors: Theresa Eimer, Andr\'e Biedenkapp, Frank Hutter, Marius Lindauer
- Abstract summary: Self-Paced Context Evaluation (SPaCE)
Based on self-paced learning, spc automatically generates task curricula online with little computational overhead.
We demonstrate SPaCE's ability to speed up learning of different value-based RL agents on two environments.
- Score: 34.8990562818147
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning (RL) has made a lot of advances for solving a single
problem in a given environment; but learning policies that generalize to unseen
variations of a problem remains challenging. To improve sample efficiency for
learning on such instances of a problem domain, we present Self-Paced Context
Evaluation (SPaCE). Based on self-paced learning, \spc automatically generates
\task curricula online with little computational overhead. To this end, SPaCE
leverages information contained in state values during training to accelerate
and improve training performance as well as generalization capabilities to new
instances from the same problem domain. Nevertheless, SPaCE is independent of
the problem domain at hand and can be applied on top of any RL agent with
state-value function approximation. We demonstrate SPaCE's ability to speed up
learning of different value-based RL agents on two environments, showing better
generalization capabilities and up to 10x faster learning compared to naive
approaches such as round robin or SPDRL, as the closest state-of-the-art
approach.
Related papers
- RL + Transformer = A General-Purpose Problem Solver [0.0]
In this study, we demonstrate that a pre-trained transformer fine-tuned with reinforcement learning over multiple episodes develops the ability to solve problems that it has never encountered before.
This powerful meta-learner not only excels in solving unseen in-distribution environments with remarkable sample efficiency, but also shows strong performance in out-of-distribution environments.
arXiv Detail & Related papers (2025-01-24T01:55:20Z) - Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models [79.28821338925947]
Domain-Class Incremental Learning is a realistic but challenging continual learning scenario.
To handle these diverse tasks, pre-trained Vision-Language Models (VLMs) are introduced for their strong generalizability.
This incurs a new problem: the knowledge encoded in the pre-trained VLMs may be disturbed when adapting to new tasks, compromising their inherent zero-shot ability.
Existing methods tackle it by tuning VLMs with knowledge distillation on extra datasets, which demands heavy overhead.
We propose the Distribution-aware Interference-free Knowledge Integration (DIKI) framework, retaining pre-trained knowledge of
arXiv Detail & Related papers (2024-07-07T12:19:37Z) - Efficient Imitation Learning with Conservative World Models [54.52140201148341]
We tackle the problem of policy learning from expert demonstrations without a reward function.
We re-frame imitation learning as a fine-tuning problem, rather than a pure reinforcement learning one.
arXiv Detail & Related papers (2024-05-21T20:53:18Z) - Projected Off-Policy Q-Learning (POP-QL) for Stabilizing Offline
Reinforcement Learning [57.83919813698673]
Projected Off-Policy Q-Learning (POP-QL) is a novel actor-critic algorithm that simultaneously reweights off-policy samples and constrains the policy to prevent divergence and reduce value-approximation error.
In our experiments, POP-QL not only shows competitive performance on standard benchmarks, but also out-performs competing methods in tasks where the data-collection policy is significantly sub-optimal.
arXiv Detail & Related papers (2023-11-25T00:30:58Z) - End-to-end Lidar-Driven Reinforcement Learning for Autonomous Racing [0.0]
Reinforcement Learning (RL) has emerged as a transformative approach in the domains of automation and robotics.
This study develops and trains an RL agent to navigate a racing environment solely using feedforward raw lidar and velocity data.
The agent's performance is then experimentally evaluated in a real-world racing scenario.
arXiv Detail & Related papers (2023-09-01T07:03:05Z) - Addressing the issue of stochastic environments and local
decision-making in multi-objective reinforcement learning [0.0]
Multi-objective reinforcement learning (MORL) is a relatively new field which builds on conventional Reinforcement Learning (RL)
This thesis focuses on what factors influence the frequency with which value-based MORL Q-learning algorithms learn the optimal policy for an environment.
arXiv Detail & Related papers (2022-11-16T04:56:42Z) - CCLF: A Contrastive-Curiosity-Driven Learning Framework for
Sample-Efficient Reinforcement Learning [56.20123080771364]
We develop a model-agnostic Contrastive-Curiosity-Driven Learning Framework (CCLF) for reinforcement learning.
CCLF fully exploit sample importance and improve learning efficiency in a self-supervised manner.
We evaluate this approach on the DeepMind Control Suite, Atari, and MiniGrid benchmarks.
arXiv Detail & Related papers (2022-05-02T14:42:05Z) - Autonomous Reinforcement Learning: Formalism and Benchmarking [106.25788536376007]
Real-world embodied learning, such as that performed by humans and animals, is situated in a continual, non-episodic world.
Common benchmark tasks in RL are episodic, with the environment resetting between trials to provide the agent with multiple attempts.
This discrepancy presents a major challenge when attempting to take RL algorithms developed for episodic simulated environments and run them on real-world platforms.
arXiv Detail & Related papers (2021-12-17T16:28:06Z) - Balancing Value Underestimation and Overestimation with Realistic
Actor-Critic [6.205681604290727]
This paper introduces a novel model-free algorithm, Realistic Actor-Critic(RAC), which can be incorporated with any off-policy RL algorithms to improve sample efficiency.
RAC employs Universal Value Function Approximators (UVFA) to simultaneously learn a policy family with the same neural network, each with different trade-offs between underestimation and overestimation.
We evaluate RAC on the MuJoCo benchmark, achieving 10x sample efficiency and 25% performance improvement on the most challenging Humanoid environment compared to SAC.
arXiv Detail & Related papers (2021-10-19T03:35:01Z) - Persistent Reinforcement Learning via Subgoal Curricula [114.83989499740193]
Value-accelerated Persistent Reinforcement Learning (VaPRL) generates a curriculum of initial states.
VaPRL reduces the interventions required by three orders of magnitude compared to episodic reinforcement learning.
arXiv Detail & Related papers (2021-07-27T16:39:45Z) - Deep RL With Information Constrained Policies: Generalization in
Continuous Control [21.46148507577606]
We show that a natural constraint on information flow might confer onto artificial agents in continuous control tasks.
We implement a novel Capacity-Limited Actor-Critic (CLAC) algorithm.
Our experiments show that compared to alternative approaches, CLAC offers improvements in generalization between training and modified test environments.
arXiv Detail & Related papers (2020-10-09T15:42:21Z) - Dynamics Generalization via Information Bottleneck in Deep Reinforcement
Learning [90.93035276307239]
We propose an information theoretic regularization objective and an annealing-based optimization method to achieve better generalization ability in RL agents.
We demonstrate the extreme generalization benefits of our approach in different domains ranging from maze navigation to robotic tasks.
This work provides a principled way to improve generalization in RL by gradually removing information that is redundant for task-solving.
arXiv Detail & Related papers (2020-08-03T02:24:20Z) - Deep Reinforcement Learning amidst Lifelong Non-Stationarity [67.24635298387624]
We show that an off-policy RL algorithm can reason about and tackle lifelong non-stationarity.
Our method leverages latent variable models to learn a representation of the environment from current and past experiences.
We also introduce several simulation environments that exhibit lifelong non-stationarity, and empirically find that our approach substantially outperforms approaches that do not reason about environment shift.
arXiv Detail & Related papers (2020-06-18T17:34:50Z) - Self-Paced Deep Reinforcement Learning [42.467323141301826]
Curriculum reinforcement learning (CRL) improves the learning speed and stability of an agent by exposing it to a tailored series of tasks throughout learning.
Despite empirical successes, an open question in CRL is how to automatically generate a curriculum for a given reinforcement learning (RL) agent, avoiding manual design.
We propose an answer by interpreting the curriculum generation as an inference problem, where distributions over tasks are progressively learned to approach the target task.
This approach leads to an automatic curriculum generation, whose pace is controlled by the agent, with solid theoretical motivation and easily integrated with deep RL algorithms.
arXiv Detail & Related papers (2020-04-24T15:48:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.