Data Efficient Training for Reinforcement Learning with Adaptive
Behavior Policy Sharing
- URL: http://arxiv.org/abs/2002.05229v1
- Date: Wed, 12 Feb 2020 20:35:31 GMT
- Title: Data Efficient Training for Reinforcement Learning with Adaptive
Behavior Policy Sharing
- Authors: Ge Liu, Rui Wu, Heng-Tze Cheng, Jing Wang, Jayden Ooi, Lihong Li, Ang
Li, Wai Lok Sibon Li, Craig Boutilier, Ed Chi
- Abstract summary: Training deep RL model is challenging in real world applications such as production-scale health-care or recommender systems.
We propose Adaptive Behavior Policy Sharing (ABPS), a data-efficient training algorithm that allows sharing of experience collected by behavior policy.
- Score: 29.283554268767805
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Deep Reinforcement Learning (RL) is proven powerful for decision making in
simulated environments. However, training deep RL model is challenging in real
world applications such as production-scale health-care or recommender systems
because of the expensiveness of interaction and limitation of budget at
deployment. One aspect of the data inefficiency comes from the expensive
hyper-parameter tuning when optimizing deep neural networks. We propose
Adaptive Behavior Policy Sharing (ABPS), a data-efficient training algorithm
that allows sharing of experience collected by behavior policy that is
adaptively selected from a pool of agents trained with an ensemble of
hyper-parameters. We further extend ABPS to evolve hyper-parameters during
training by hybridizing ABPS with an adapted version of Population Based
Training (ABPS-PBT). We conduct experiments with multiple Atari games with up
to 16 hyper-parameter/architecture setups. ABPS achieves superior overall
performance, reduced variance on top 25% agents, and equivalent performance on
the best agent compared to conventional hyper-parameter tuning with independent
training, even though ABPS only requires the same number of environmental
interactions as training a single agent. We also show that ABPS-PBT further
improves the convergence speed and reduces the variance.
Related papers
- Simultaneous Training of First- and Second-Order Optimizers in Population-Based Reinforcement Learning [0.0]
Population-based training (PBT) provides a method to achieve this by continuously tuning hyperparameters throughout the training.
We propose an enhancement to PBT by simultaneously utilizing both first- and second-orders within a single population.
arXiv Detail & Related papers (2024-08-27T21:54:26Z) - Generalized Population-Based Training for Hyperparameter Optimization in Reinforcement Learning [10.164982368785854]
Generalized Population-Based Training (GPBT) and Pairwise Learning (PL)
PL employs a comprehensive pairwise strategy to identify performance differentials and provide holistic guidance to underperforming agents.
arXiv Detail & Related papers (2024-04-12T04:23:20Z) - PYRA: Parallel Yielding Re-Activation for Training-Inference Efficient Task Adaptation [61.57833648734164]
We propose a novel Parallel Yielding Re-Activation (PYRA) method for training-inference efficient task adaptation.
PYRA outperforms all competing methods under both low compression rate and high compression rate.
arXiv Detail & Related papers (2024-03-14T09:06:49Z) - Hybrid Reinforcement Learning for Optimizing Pump Sustainability in
Real-World Water Distribution Networks [55.591662978280894]
This article addresses the pump-scheduling optimization problem to enhance real-time control of real-world water distribution networks (WDNs)
Our primary objectives are to adhere to physical operational constraints while reducing energy consumption and operational costs.
Traditional optimization techniques, such as evolution-based and genetic algorithms, often fall short due to their lack of convergence guarantees.
arXiv Detail & Related papers (2023-10-13T21:26:16Z) - Efficient and Effective Augmentation Strategy for Adversarial Training [48.735220353660324]
Adversarial training of Deep Neural Networks is known to be significantly more data-hungry than standard training.
We propose Diverse Augmentation-based Joint Adversarial Training (DAJAT) to use data augmentations effectively in adversarial training.
arXiv Detail & Related papers (2022-10-27T10:59:55Z) - Online Weighted Q-Ensembles for Reduced Hyperparameter Tuning in
Reinforcement Learning [0.38073142980732994]
Reinforcement learning is a promising paradigm for learning robot control, allowing complex control policies to be learned without requiring a dynamics model.
We propose employing an ensemble of multiple reinforcement learning agents, each with a different set of hyper parameters, along with a mechanism for choosing the best performing set.
Online weighted Q-Ensemble presented overall lower variance and superior results when compared with q-average ensembles.
arXiv Detail & Related papers (2022-09-29T19:57:43Z) - Distributed Adversarial Training to Robustify Deep Neural Networks at
Scale [100.19539096465101]
Current deep neural networks (DNNs) are vulnerable to adversarial attacks, where adversarial perturbations to the inputs can change or manipulate classification.
To defend against such attacks, an effective approach, known as adversarial training (AT), has been shown to mitigate robust training.
We propose a large-batch adversarial training framework implemented over multiple machines.
arXiv Detail & Related papers (2022-06-13T15:39:43Z) - Online Convolutional Re-parameterization [51.97831675242173]
We present online convolutional re- parameterization (OREPA), a two-stage pipeline, aiming to reduce the huge training overhead by squeezing the complex training-time block into a single convolution.
Compared with the state-of-the-art re-param models, OREPA is able to save the training-time memory cost by about 70% and accelerate the training speed by around 2x.
We also conduct experiments on object detection and semantic segmentation and show consistent improvements on the downstream tasks.
arXiv Detail & Related papers (2022-04-02T09:50:19Z) - Hybrid Learning for Orchestrating Deep Learning Inference in Multi-user
Edge-cloud Networks [3.7630209350186807]
Collaborative end-edge-cloud computing for deep learning provides a range of performance and efficiency.
Deep Learning inference orchestration strategy employs reinforcement learning to find the optimal orchestration policy.
We demonstrate efficacy of our HL strategy through experimental comparison with state-of-the-art RL-based inference orchestration.
arXiv Detail & Related papers (2022-02-21T21:50:50Z) - APS: Active Pretraining with Successor Features [96.24533716878055]
We show that by reinterpreting and combining successorcitepHansenFast with non entropy, the intractable mutual information can be efficiently optimized.
The proposed method Active Pretraining with Successor Feature (APS) explores the environment via non entropy, and the explored data can be efficiently leveraged to learn behavior.
arXiv Detail & Related papers (2021-08-31T16:30:35Z) - On the Importance of Hyperparameter Optimization for Model-based
Reinforcement Learning [27.36718899899319]
Model-based Reinforcement Learning (MBRL) is a promising framework for learning control in a data-efficient manner.
MBRL typically requires significant human expertise before it can be applied to new problems and domains.
arXiv Detail & Related papers (2021-02-26T18:57:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.