Rethinking Population-assisted Off-policy Reinforcement Learning
- URL: http://arxiv.org/abs/2305.02949v1
- Date: Thu, 4 May 2023 15:53:00 GMT
- Title: Rethinking Population-assisted Off-policy Reinforcement Learning
- Authors: Bowen Zheng, Ran Cheng
- Abstract summary: Off-policy reinforcement learning algorithms struggle with convergence to local optima due to limited exploration.
Population-based algorithms offer a natural exploration strategy, but their black-box operators are inefficient.
Recent algorithms have integrated these two methods, connecting them through a shared replay buffer.
- Score: 7.837628433605179
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While off-policy reinforcement learning (RL) algorithms are sample efficient
due to gradient-based updates and data reuse in the replay buffer, they
struggle with convergence to local optima due to limited exploration. On the
other hand, population-based algorithms offer a natural exploration strategy,
but their heuristic black-box operators are inefficient. Recent algorithms have
integrated these two methods, connecting them through a shared replay buffer.
However, the effect of using diverse data from population optimization
iterations on off-policy RL algorithms has not been thoroughly investigated. In
this paper, we first analyze the use of off-policy RL algorithms in combination
with population-based algorithms, showing that the use of population data could
introduce an overlooked error and harm performance. To test this, we propose a
uniform and scalable training design and conduct experiments on our tailored
framework in robot locomotion tasks from the OpenAI gym. Our results
substantiate that using population data in off-policy RL can cause instability
during training and even degrade performance. To remedy this issue, we further
propose a double replay buffer design that provides more on-policy data and
show its effectiveness through experiments. Our results offer practical
insights for training these hybrid methods.
Related papers
- Hybrid Inverse Reinforcement Learning [34.793570631021005]
inverse reinforcement learning approach to imitation learning is a double-edged sword.
We propose using hybrid RL -- training on a mixture of online and expert data -- to curtail unnecessary exploration.
We derive both model-free and model-based hybrid inverse RL algorithms with strong policy performance guarantees.
arXiv Detail & Related papers (2024-02-13T23:29:09Z) - Personalized Federated Deep Reinforcement Learning-based Trajectory
Optimization for Multi-UAV Assisted Edge Computing [22.09756306579992]
UAVs can serve as intelligent servers in edge computing environments, optimizing their flight trajectories to maximize communication system throughput.
Deep reinforcement learning (DRL)-based trajectory optimization algorithms may suffer from poor training performance due to intricate terrain features and inadequate training data.
This work proposes a novel solution, namely personalized federated deep reinforcement learning (PF-DRL), for multi-UAV trajectory optimization.
arXiv Detail & Related papers (2023-09-05T12:54:40Z) - A Data-Centric Approach for Improving Adversarial Training Through the
Lens of Out-of-Distribution Detection [0.4893345190925178]
We propose detecting and removing hard samples directly from the training procedure rather than applying complicated algorithms to mitigate their effects.
Our results on SVHN and CIFAR-10 datasets show the effectiveness of this method in improving the adversarial training without adding too much computational cost.
arXiv Detail & Related papers (2023-01-25T08:13:50Z) - Boosting Offline Reinforcement Learning via Data Rebalancing [104.3767045977716]
offline reinforcement learning (RL) is challenged by the distributional shift between learning policies and datasets.
We propose a simple yet effective method to boost offline RL algorithms based on the observation that resampling a dataset keeps the distribution support unchanged.
We dub our method ReD (Return-based Data Rebalance), which can be implemented with less than 10 lines of code change and adds negligible running time.
arXiv Detail & Related papers (2022-10-17T16:34:01Z) - Reward Uncertainty for Exploration in Preference-based Reinforcement
Learning [88.34958680436552]
We present an exploration method specifically for preference-based reinforcement learning algorithms.
Our main idea is to design an intrinsic reward by measuring the novelty based on learned reward.
Our experiments show that exploration bonus from uncertainty in learned reward improves both feedback- and sample-efficiency of preference-based RL algorithms.
arXiv Detail & Related papers (2022-05-24T23:22:10Z) - Data-Driven Evaluation of Training Action Space for Reinforcement
Learning [1.370633147306388]
This paper proposes a Shapley-inspired methodology for training action space categorization and ranking.
To reduce exponential-time shapley computations, the methodology includes a Monte Carlo simulation.
The proposed data-driven methodology is RL to different domains, use cases, and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-08T04:53:43Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - Scalable Deep Reinforcement Learning Algorithms for Mean Field Games [60.550128966505625]
Mean Field Games (MFGs) have been introduced to efficiently approximate games with very large populations of strategic agents.
Recently, the question of learning equilibria in MFGs has gained momentum, particularly using model-free reinforcement learning (RL) methods.
Existing algorithms to solve MFGs require the mixing of approximated quantities such as strategies or $q$-values.
We propose two methods to address this shortcoming. The first one learns a mixed strategy from distillation of historical data into a neural network and is applied to the Fictitious Play algorithm.
The second one is an online mixing method based on
arXiv Detail & Related papers (2022-03-22T18:10:32Z) - Reinforcement Learning in the Wild: Scalable RL Dispatching Algorithm
Deployed in Ridehailing Marketplace [12.298997392937876]
This study proposes a real-time dispatching algorithm based on reinforcement learning.
It is deployed online in multiple cities under DiDi's operation for A/B testing and is launched in one of the major international markets.
The deployed algorithm shows over 1.3% improvement in total driver income from A/B testing.
arXiv Detail & Related papers (2022-02-10T16:07:17Z) - Learning Sampling Policy for Faster Derivative Free Optimization [100.27518340593284]
We propose a new reinforcement learning based ZO algorithm (ZO-RL) with learning the sampling policy for generating the perturbations in ZO optimization instead of using random sampling.
Our results show that our ZO-RL algorithm can effectively reduce the variances of ZO gradient by learning a sampling policy, and converge faster than existing ZO algorithms in different scenarios.
arXiv Detail & Related papers (2021-04-09T14:50:59Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.