Single-Trajectory Distributionally Robust Reinforcement Learning
- URL: http://arxiv.org/abs/2301.11721v1
- Date: Fri, 27 Jan 2023 14:08:09 GMT
- Title: Single-Trajectory Distributionally Robust Reinforcement Learning
- Authors: Zhipeng Liang, Xiaoteng Ma, Jose Blanchet, Jiheng Zhang, Zhengyuan
Zhou
- Abstract summary: Reinforcement Learning (RL) has been regarded as an essential component leading to Artificial General Intelligence (AGI)
However, RL is often criticized for having the same training environment as the test one, which also hinders its application in the real world.
To mitigate this problem, Distributionally Robust RL (DRRL) is proposed to improve the worst performance in a set of environments that may contain the unknown test environment.
- Score: 13.013268095049236
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As a framework for sequential decision-making, Reinforcement Learning (RL)
has been regarded as an essential component leading to Artificial General
Intelligence (AGI). However, RL is often criticized for having the same
training environment as the test one, which also hinders its application in the
real world. To mitigate this problem, Distributionally Robust RL (DRRL) is
proposed to improve the worst performance in a set of environments that may
contain the unknown test environment. Due to the nonlinearity of the robustness
goal, most of the previous work resort to the model-based approach, learning
with either an empirical distribution learned from the data or a simulator that
can be sampled infinitely, which limits their applications in simple dynamics
environments. In contrast, we attempt to design a DRRL algorithm that can be
trained along a single trajectory, i.e., no repeated sampling from a state.
Based on the standard Q-learning, we propose distributionally robust Q-learning
with the single trajectory (DRQ) and its average-reward variant named
differential DRQ. We provide asymptotic convergence guarantees and experiments
for both settings, demonstrating their superiority in the perturbed
environments against the non-robust ones.
Related papers
- Uncertainty Aware Learning for Language Model Alignment [97.36361196793929]
We propose uncertainty-aware learning (UAL) to improve the model alignment of different task scenarios.
We implement UAL in a simple fashion -- adaptively setting the label smoothing value of training according to the uncertainty of individual samples.
Experiments on widely used benchmarks demonstrate that our UAL significantly and consistently outperforms standard supervised fine-tuning.
arXiv Detail & Related papers (2024-06-07T11:37:45Z) - Zero-Sum Positional Differential Games as a Framework for Robust Reinforcement Learning: Deep Q-Learning Approach [2.3020018305241337]
This paper is the first to propose considering the RRL problems within the positional differential game theory.
Namely, we prove that under Isaacs's condition, the same Q-function can be utilized as an approximate solution of both minimax and maximin Bellman equations.
We present the Isaacs Deep Q-Network algorithms and demonstrate their superiority compared to other baseline RRL and Multi-Agent RL algorithms in various environments.
arXiv Detail & Related papers (2024-05-03T12:21:43Z) - Distributionally Robust Reinforcement Learning with Interactive Data Collection: Fundamental Hardness and Near-Optimal Algorithm [14.517103323409307]
Sim-to-real gap represents disparity between training and testing environments.
A promising approach to addressing this challenge is distributionally robust RL.
We tackle robust RL via interactive data collection and present an algorithm with a provable sample complexity guarantee.
arXiv Detail & Related papers (2024-04-04T16:40:22Z) - Take the Bull by the Horns: Hard Sample-Reweighted Continual Training
Improves LLM Generalization [165.98557106089777]
A key challenge is to enhance the capabilities of large language models (LLMs) amid a looming shortage of high-quality training data.
Our study starts from an empirical strategy for the light continual training of LLMs using their original pre-training data sets.
We then formalize this strategy into a principled framework of Instance-Reweighted Distributionally Robust Optimization.
arXiv Detail & Related papers (2024-02-22T04:10:57Z) - Understanding, Predicting and Better Resolving Q-Value Divergence in
Offline-RL [86.0987896274354]
We first identify a fundamental pattern, self-excitation, as the primary cause of Q-value estimation divergence in offline RL.
We then propose a novel Self-Excite Eigenvalue Measure (SEEM) metric to measure the evolving property of Q-network at training.
For the first time, our theory can reliably decide whether the training will diverge at an early stage.
arXiv Detail & Related papers (2023-10-06T17:57:44Z) - Reinforcement Learning from Diverse Human Preferences [68.4294547285359]
This paper develops a method for crowd-sourcing preference labels and learning from diverse human preferences.
The proposed method is tested on a variety of tasks in DMcontrol and Meta-world.
It has shown consistent and significant improvements over existing preference-based RL algorithms when learning from diverse feedback.
arXiv Detail & Related papers (2023-01-27T15:18:54Z) - CostNet: An End-to-End Framework for Goal-Directed Reinforcement
Learning [9.432068833600884]
Reinforcement Learning (RL) is a general framework concerned with an agent that seeks to maximize rewards in an environment.
There are two approaches, model-based and model-free reinforcement learning, that show concrete results in several disciplines.
This paper introduces a novel reinforcement learning algorithm for predicting the distance between two states in a Markov Decision Process.
arXiv Detail & Related papers (2022-10-03T21:16:14Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - Replay-Guided Adversarial Environment Design [21.305857977725886]
We argue that by curating completely random levels, PLR can generate novel and complex levels for effective training.
We show that our new method, PLR$perp$, obtains better results on a suite of out-of-distribution, zero-shot transfer tasks.
arXiv Detail & Related papers (2021-10-06T01:01:39Z) - SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep
Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms.
SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.