No-Regret Reinforcement Learning with Heavy-Tailed Rewards
- URL: http://arxiv.org/abs/2102.12769v1
- Date: Thu, 25 Feb 2021 10:25:57 GMT
- Title: No-Regret Reinforcement Learning with Heavy-Tailed Rewards
- Authors: Vincent Zhuang, Yanan Sui
- Abstract summary: We show that the difficulty of learning heavy-tailed rewards dominates the difficulty of learning transition probabilities.
Our algorithms naturally generalize to deep reinforcement learning applications.
All of our algorithms outperform baselines on both synthetic MDPs and standard RL benchmarks.
- Score: 11.715649997214125
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning algorithms typically assume rewards to be sampled from
light-tailed distributions, such as Gaussian or bounded. However, a wide
variety of real-world systems generate rewards that follow heavy-tailed
distributions. We consider such scenarios in the setting of undiscounted
reinforcement learning. By constructing a lower bound, we show that the
difficulty of learning heavy-tailed rewards asymptotically dominates the
difficulty of learning transition probabilities. Leveraging techniques from
robust mean estimation, we propose Heavy-UCRL2 and Heavy-Q-Learning, and show
that they achieve near-optimal regret bounds in this setting. Our algorithms
also naturally generalize to deep reinforcement learning applications; we
instantiate Heavy-DQN as an example of this. We demonstrate that all of our
algorithms outperform baselines on both synthetic MDPs and standard RL
benchmarks.
Related papers
- Robust Stochastically-Descending Unrolled Networks [85.6993263983062]
Deep unrolling is an emerging learning-to-optimize method that unrolls a truncated iterative algorithm in the layers of a trainable neural network.
We show that convergence guarantees and generalizability of the unrolled networks are still open theoretical problems.
We numerically assess unrolled architectures trained under the proposed constraints in two different applications.
arXiv Detail & Related papers (2023-12-25T18:51:23Z) - ContainerGym: A Real-World Reinforcement Learning Benchmark for Resource
Allocation [1.6058099298620425]
ContainerGym is a benchmark for reinforcement learning inspired by a real-world industrial resource allocation task.
The proposed benchmark encodes challenges commonly encountered in real-world sequential decision making problems.
It can be configured to instantiate problems of varying degrees of difficulty.
arXiv Detail & Related papers (2023-07-06T13:44:29Z) - Stochastic Unrolled Federated Learning [85.6993263983062]
We introduce UnRolled Federated learning (SURF), a method that expands algorithm unrolling to federated learning.
Our proposed method tackles two challenges of this expansion, namely the need to feed whole datasets to the unrolleds and the decentralized nature of federated learning.
arXiv Detail & Related papers (2023-05-24T17:26:22Z) - Single-Trajectory Distributionally Robust Reinforcement Learning [21.955807398493334]
We propose Distributionally Robust RL (DRRL) to enhance performance across a range of environments.
Existing DRRL algorithms are either model-based or fail to learn from a single sample trajectory.
We design a first fully model-free DRRL algorithm, called distributionally robust Q-learning with single trajectory (DRQ)
arXiv Detail & Related papers (2023-01-27T14:08:09Z) - Supervised Advantage Actor-Critic for Recommender Systems [76.7066594130961]
We propose negative sampling strategy for training the RL component and combine it with supervised sequential learning.
Based on sampled (negative) actions (items), we can calculate the "advantage" of a positive action over the average case.
We instantiate SNQN and SA2C with four state-of-the-art sequential recommendation models and conduct experiments on two real-world datasets.
arXiv Detail & Related papers (2021-11-05T12:51:15Z) - SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep
Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms.
SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z) - Accelerated Convergence for Counterfactual Learning to Rank [65.63997193915257]
We show that convergence rate of SGD approaches with IPS-weighted gradients suffers from the large variance introduced by the IPS weights.
We propose a novel learning algorithm, called CounterSample, that has provably better convergence than standard IPS-weighted gradient descent methods.
We prove that CounterSample converges faster and complement our theoretical findings with empirical results.
arXiv Detail & Related papers (2020-05-21T12:53:36Z) - Unbiased Deep Reinforcement Learning: A General Training Framework for
Existing and Future Algorithms [3.7050607140679026]
We propose a novel training framework that is conceptually comprehensible and potentially easy to be generalized to all feasible algorithms for reinforcement learning.
We employ Monte-carlo sampling to achieve raw data inputs, and train them in batch to achieve Markov decision process sequences.
We propose several algorithms embedded with our new framework to deal with typical discrete and continuous scenarios.
arXiv Detail & Related papers (2020-05-12T01:51:08Z) - Deep Reinforcement Learning with Weighted Q-Learning [43.823659028488876]
Reinforcement learning algorithms based on Q-learning are driving Deep Reinforcement Learning (DRL) research towards solving complex problems.
Q-Learning is known to be positively biased since it learns by using the maximum over noisy estimates of expected values.
We show how our novel Deep Weighted Q-Learning algorithm reduces the bias w.r.t. relevant baselines and provides empirical evidence of its advantages on representative benchmarks.
arXiv Detail & Related papers (2020-03-20T13:57:40Z) - Nested-Wasserstein Self-Imitation Learning for Sequence Generation [158.19606942252284]
We propose the concept of nested-Wasserstein distance for distributional semantic matching.
A novel nested-Wasserstein self-imitation learning framework is developed, encouraging the model to exploit historical high-rewarded sequences.
arXiv Detail & Related papers (2020-01-20T02:19:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.