Actively Learning Costly Reward Functions for Reinforcement Learning
- URL: http://arxiv.org/abs/2211.13260v1
- Date: Wed, 23 Nov 2022 19:17:20 GMT
- Title: Actively Learning Costly Reward Functions for Reinforcement Learning
- Authors: Andr\'e Eberhard, Houssam Metni, Georg Fahland, Alexander Stroh,
Pascal Friederich
- Abstract summary: We show that it is possible to train agents in complex real-world environments orders of magnitudes faster.
By enabling the application of reinforcement learning methods to new domains, we show that we can find interesting and non-trivial solutions.
- Score: 56.34005280792013
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transfer of recent advances in deep reinforcement learning to real-world
applications is hindered by high data demands and thus low efficiency and
scalability. Through independent improvements of components such as replay
buffers or more stable learning algorithms, and through massively distributed
systems, training time could be reduced from several days to several hours for
standard benchmark tasks. However, while rewards in simulated environments are
well-defined and easy to compute, reward evaluation becomes the bottleneck in
many real-world environments, e.g., in molecular optimization tasks, where
computationally demanding simulations or even experiments are required to
evaluate states and to quantify rewards. Therefore, training might become
prohibitively expensive without an extensive amount of computational resources
and time. We propose to alleviate this problem by replacing costly ground-truth
rewards with rewards modeled by neural networks, counteracting non-stationarity
of state and reward distributions during training with an active learning
component. We demonstrate that using our proposed ACRL method (Actively
learning Costly rewards for Reinforcement Learning), it is possible to train
agents in complex real-world environments orders of magnitudes faster. By
enabling the application of reinforcement learning methods to new domains, we
show that we can find interesting and non-trivial solutions to real-world
optimization problems in chemistry, materials science and engineering.
Related papers
- Comprehensive Overview of Reward Engineering and Shaping in Advancing Reinforcement Learning Applications [0.0]
This paper emphasizes the importance of reward engineering and reward shaping in enhancing the efficiency and effectiveness of reinforcement learning algorithms.
Despite significant advancements in reinforcement learning, several limitations persist.
One key challenge is the sparse and delayed nature of rewards in many real-world scenarios.
The complexity of accurately modeling real-world environments and the computational demands of reinforcement learning algorithms remain substantial obstacles.
arXiv Detail & Related papers (2024-07-22T09:28:12Z) - Offline Reinforcement Learning with Imputed Rewards [8.856568375969848]
We propose a Reward Model that can estimate the reward signal from a very limited sample of environment transitions annotated with rewards.
Our results show that, using only 1% of reward-labeled transitions from the original datasets, our learned reward model is able to impute rewards for the remaining 99% of the transitions.
arXiv Detail & Related papers (2024-07-15T15:53:13Z) - REBOOT: Reuse Data for Bootstrapping Efficient Real-World Dexterous
Manipulation [61.7171775202833]
We introduce an efficient system for learning dexterous manipulation skills withReinforcement learning.
The main idea of our approach is the integration of recent advances in sample-efficient RL and replay buffer bootstrapping.
Our system completes the real-world training cycle by incorporating learned resets via an imitation-based pickup policy.
arXiv Detail & Related papers (2023-09-06T19:05:31Z) - Hindsight States: Blending Sim and Real Task Elements for Efficient
Reinforcement Learning [61.3506230781327]
In robotics, one approach to generate training data builds on simulations based on dynamics models derived from first principles.
Here, we leverage the imbalance in complexity of the dynamics to learn more sample-efficiently.
We validate our method on several challenging simulated tasks and demonstrate that it improves learning both alone and when combined with an existing hindsight algorithm.
arXiv Detail & Related papers (2023-03-03T21:55:04Z) - FIRE: A Failure-Adaptive Reinforcement Learning Framework for Edge Computing Migrations [52.85536740465277]
FIRE is a framework that adapts to rare events by training a RL policy in an edge computing digital twin environment.
We propose ImRE, an importance sampling-based Q-learning algorithm, which samples rare events proportionally to their impact on the value function.
We show that FIRE reduces costs compared to vanilla RL and the greedy baseline in the event of failures.
arXiv Detail & Related papers (2022-09-28T19:49:39Z) - Accelerated Policy Learning with Parallel Differentiable Simulation [59.665651562534755]
We present a differentiable simulator and a new policy learning algorithm (SHAC)
Our algorithm alleviates problems with local minima through a smooth critic function.
We show substantial improvements in sample efficiency and wall-clock time over state-of-the-art RL and differentiable simulation-based algorithms.
arXiv Detail & Related papers (2022-04-14T17:46:26Z) - A Distributed Deep Reinforcement Learning Technique for Application
Placement in Edge and Fog Computing Environments [31.326505188936746]
Several Deep Reinforcement Learning (DRL)-based placement techniques have been proposed in fog/edge computing environments.
We propose an actor-critic-based distributed application placement technique, working based on the IMPortance weighted Actor-Learner Architectures (IMPALA)
arXiv Detail & Related papers (2021-10-24T11:25:03Z) - Online reinforcement learning with sparse rewards through an active
inference capsule [62.997667081978825]
This paper introduces an active inference agent which minimizes the novel free energy of the expected future.
Our model is capable of solving sparse-reward problems with a very high sample efficiency.
We also introduce a novel method for approximating the prior model from the reward function, which simplifies the expression of complex objectives.
arXiv Detail & Related papers (2021-06-04T10:03:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.