Single-Trajectory Distributionally Robust Reinforcement Learning
- URL: http://arxiv.org/abs/2301.11721v1
- Date: Fri, 27 Jan 2023 14:08:09 GMT
- Title: Single-Trajectory Distributionally Robust Reinforcement Learning
- Authors: Zhipeng Liang, Xiaoteng Ma, Jose Blanchet, Jiheng Zhang, Zhengyuan
Zhou
- Abstract summary: Reinforcement Learning (RL) has been regarded as an essential component leading to Artificial General Intelligence (AGI)
However, RL is often criticized for having the same training environment as the test one, which also hinders its application in the real world.
To mitigate this problem, Distributionally Robust RL (DRRL) is proposed to improve the worst performance in a set of environments that may contain the unknown test environment.
- Score: 13.013268095049236
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As a framework for sequential decision-making, Reinforcement Learning (RL)
has been regarded as an essential component leading to Artificial General
Intelligence (AGI). However, RL is often criticized for having the same
training environment as the test one, which also hinders its application in the
real world. To mitigate this problem, Distributionally Robust RL (DRRL) is
proposed to improve the worst performance in a set of environments that may
contain the unknown test environment. Due to the nonlinearity of the robustness
goal, most of the previous work resort to the model-based approach, learning
with either an empirical distribution learned from the data or a simulator that
can be sampled infinitely, which limits their applications in simple dynamics
environments. In contrast, we attempt to design a DRRL algorithm that can be
trained along a single trajectory, i.e., no repeated sampling from a state.
Based on the standard Q-learning, we propose distributionally robust Q-learning
with the single trajectory (DRQ) and its average-reward variant named
differential DRQ. We provide asymptotic convergence guarantees and experiments
for both settings, demonstrating their superiority in the perturbed
environments against the non-robust ones.
Related papers
- Latent Guided Sampling for Combinatorial Optimization [3.636090511738153]
Recent Combinatorial Optimization methods leverage deep learning to learn solution strategies, trained via Supervised or Reinforcement Learning (RL)<n>While promising, these approaches often rely on task-specific augmentations, perform poorly on out-of-distribution instances, and lack robust inference mechanisms.<n>In this work, we propose LGS-Net, a novel latent space model that conditions on efficient problem instances, and introduce an efficient Neural inference method, Latent Guided Sampling (LGS)
arXiv Detail & Related papers (2025-06-04T08:02:59Z) - Model-Free Robust Reinforcement Learning with Sample Complexity Analysis [16.477827600825428]
This paper proposes a model-free DR-RL algorithm leveraging the Multi-level Monte Carlo technique.
We develop algorithms for uncertainty sets defined by total variation, Chi-square divergence, and KL divergence.
Remarkably, our algorithms represent the first model-free DR-RL approach featuring finite sample complexity.
arXiv Detail & Related papers (2024-06-24T19:35:26Z) - Distributionally Robust Reinforcement Learning with Interactive Data Collection: Fundamental Hardness and Near-Optimal Algorithm [14.517103323409307]
Sim-to-real gap represents disparity between training and testing environments.
A promising approach to addressing this challenge is distributionally robust RL.
We tackle robust RL via interactive data collection and present an algorithm with a provable sample complexity guarantee.
arXiv Detail & Related papers (2024-04-04T16:40:22Z) - Distributionally Robust Model-based Reinforcement Learning with Large
State Spaces [55.14361269378122]
Three major challenges in reinforcement learning are the complex dynamical systems with large state spaces, the costly data acquisition processes, and the deviation of real-world dynamics from the training environment deployment.
We study distributionally robust Markov decision processes with continuous state spaces under the widely used Kullback-Leibler, chi-square, and total variation uncertainty sets.
We propose a model-based approach that utilizes Gaussian Processes and the maximum variance reduction algorithm to efficiently learn multi-output nominal transition dynamics.
arXiv Detail & Related papers (2023-09-05T13:42:11Z) - Provable Guarantees for Generative Behavior Cloning: Bridging Low-Level
Stability and High-Level Behavior [51.60683890503293]
We propose a theoretical framework for studying behavior cloning of complex expert demonstrations using generative modeling.
We show that pure supervised cloning can generate trajectories matching the per-time step distribution of arbitrary expert trajectories.
arXiv Detail & Related papers (2023-07-27T04:27:26Z) - Maximize to Explore: One Objective Function Fusing Estimation, Planning,
and Exploration [87.53543137162488]
We propose an easy-to-implement online reinforcement learning (online RL) framework called textttMEX.
textttMEX integrates estimation and planning components while balancing exploration exploitation automatically.
It can outperform baselines by a stable margin in various MuJoCo environments with sparse rewards.
arXiv Detail & Related papers (2023-05-29T17:25:26Z) - Stochastic Unrolled Federated Learning [85.6993263983062]
We introduce UnRolled Federated learning (SURF), a method that expands algorithm unrolling to federated learning.
Our proposed method tackles two challenges of this expansion, namely the need to feed whole datasets to the unrolleds and the decentralized nature of federated learning.
arXiv Detail & Related papers (2023-05-24T17:26:22Z) - Distributed Distributionally Robust Optimization with Non-Convex
Objectives [24.64654924173679]
Asynchronous distributed algorithm named Asynchronous Single-looP alternatIve gRadient projEction is proposed.
New uncertainty set, i.e., constrained D-norm uncertainty set, is developed to leverage the prior distribution and flexibly control the degree of robustness.
empirical studies on real-world datasets demonstrate that the proposed method can not only achieve fast convergence, but also remain robust against data as well as malicious attacks.
arXiv Detail & Related papers (2022-10-14T07:39:13Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - Asynchronous Distributed Reinforcement Learning for LQR Control via Zeroth-Order Block Coordinate Descent [7.6860514640178]
We propose a novel zeroth-order optimization algorithm for distributed reinforcement learning.
It allows each agent to estimate its local gradient by cost evaluation independently, without use of any consensus protocol.
arXiv Detail & Related papers (2021-07-26T18:11:07Z) - Fast Distributionally Robust Learning with Variance Reduced Min-Max
Optimization [85.84019017587477]
Distributionally robust supervised learning is emerging as a key paradigm for building reliable machine learning systems for real-world applications.
Existing algorithms for solving Wasserstein DRSL involve solving complex subproblems or fail to make use of gradients.
We revisit Wasserstein DRSL through the lens of min-max optimization and derive scalable and efficiently implementable extra-gradient algorithms.
arXiv Detail & Related papers (2021-04-27T16:56:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.