Related papers: Distributionally Robust Model-based Reinforcement Learning with Large State Spaces

Distributionally Robust Model-based Reinforcement Learning with Large State Spaces

URL: http://arxiv.org/abs/2309.02236v1
Date: Tue, 5 Sep 2023 13:42:11 GMT
Title: Distributionally Robust Model-based Reinforcement Learning with Large State Spaces
Authors: Shyam Sundhar Ramesh, Pier Giuseppe Sessa, Yifan Hu, Andreas Krause, Ilija Bogunovic
Abstract summary: Three major challenges in reinforcement learning are the complex dynamical systems with large state spaces, the costly data acquisition processes, and the deviation of real-world dynamics from the training environment deployment. We study distributionally robust Markov decision processes with continuous state spaces under the widely used Kullback-Leibler, chi-square, and total variation uncertainty sets. We propose a model-based approach that utilizes Gaussian Processes and the maximum variance reduction algorithm to efficiently learn multi-output nominal transition dynamics.
Score: 55.14361269378122
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Three major challenges in reinforcement learning are the complex dynamical systems with large state spaces, the costly data acquisition processes, and the deviation of real-world dynamics from the training environment deployment. To overcome these issues, we study distributionally robust Markov decision processes with continuous state spaces under the widely used Kullback-Leibler, chi-square, and total variation uncertainty sets. We propose a model-based approach that utilizes Gaussian Processes and the maximum variance reduction algorithm to efficiently learn multi-output nominal transition dynamics, leveraging access to a generative model (i.e., simulator). We further demonstrate the statistical sample complexity of the proposed method for different uncertainty sets. These complexity bounds are independent of the number of states and extend beyond linear dynamics, ensuring the effectiveness of our approach in identifying near-optimal distributionally-robust policies. The proposed method can be further combined with other model-free distributionally robust reinforcement learning methods to obtain a near-optimal robust policy. Experimental results demonstrate the robustness of our algorithm to distributional shifts and its superior performance in terms of the number of samples needed.

Related papers

Efficient Solution and Learning of Robust Factored MDPs [57.2416302384766]
Learning r-MDPs from interactions with an unknown environment enables the synthesis of robust policies with provable guarantees on performance.<n>We propose novel methods for solving and learning r-MDPs based on factored state representations.
arXiv Detail & Related papers (2025-08-01T15:23:15Z)
Latent Guided Sampling for Combinatorial Optimization [3.636090511738153]
Recent Combinatorial Optimization methods leverage deep learning to learn solution strategies, trained via Supervised or Reinforcement Learning (RL)<n>While promising, these approaches often rely on task-specific augmentations, perform poorly on out-of-distribution instances, and lack robust inference mechanisms.<n>In this work, we propose LGS-Net, a novel latent space model that conditions on efficient problem instances, and introduce an efficient Neural inference method, Latent Guided Sampling (LGS)
arXiv Detail & Related papers (2025-06-04T08:02:59Z)
Learning Controlled Stochastic Differential Equations [61.82896036131116]
This work proposes a novel method for estimating both drift and diffusion coefficients of continuous, multidimensional, nonlinear controlled differential equations with non-uniform diffusion. We provide strong theoretical guarantees, including finite-sample bounds for (L2), (Linfty), and risk metrics, with learning rates adaptive to coefficients' regularity. Our method is available as an open-source Python library.
arXiv Detail & Related papers (2024-11-04T11:09:58Z)
Sample Complexity of Offline Distributionally Robust Linear Markov Decision Processes [37.15580574143281]
offline reinforcement learning (RL) This paper considers the sample complexity of distributionally robust linear Markov decision processes (MDPs) with an uncertainty set characterized by the total variation distance using offline data. We develop a pessimistic model-based algorithm and establish its sample complexity bound under minimal data coverage assumptions.
arXiv Detail & Related papers (2024-03-19T17:48:42Z)
The Risk of Federated Learning to Skew Fine-Tuning Features and Underperform Out-of-Distribution Robustness [50.52507648690234]
Federated learning has the risk of skewing fine-tuning features and compromising the robustness of the model. We introduce three robustness indicators and conduct experiments across diverse robust datasets. Our approach markedly enhances the robustness across diverse scenarios, encompassing various parameter-efficient fine-tuning methods.
arXiv Detail & Related papers (2024-01-25T09:18:51Z)
Free-Form Variational Inference for Gaussian Process State-Space Models [21.644570034208506]
We propose a new method for inference in Bayesian GPSSMs. Our method is based on freeform variational inference via inducing Hamiltonian Monte Carlo. We show that our approach can learn transition dynamics and latent states more accurately than competing methods.
arXiv Detail & Related papers (2023-02-20T11:34:16Z)
Latent Variable Representation for Reinforcement Learning [131.03944557979725]
It remains unclear theoretically and empirically how latent variable models may facilitate learning, planning, and exploration to improve the sample efficiency of model-based reinforcement learning. We provide a representation view of the latent variable models for state-action value functions, which allows both tractable variational learning algorithm and effective implementation of the optimism/pessimism principle. In particular, we propose a computationally efficient planning algorithm with UCB exploration by incorporating kernel embeddings of latent variable models.
arXiv Detail & Related papers (2022-12-17T00:26:31Z)
Guaranteed Conservation of Momentum for Learning Particle-based Fluid Dynamics [96.9177297872723]
We present a novel method for guaranteeing linear momentum in learned physics simulations. We enforce conservation of momentum with a hard constraint, which we realize via antisymmetrical continuous convolutional layers. In combination, the proposed method allows us to increase the physical accuracy of the learned simulator substantially.
arXiv Detail & Related papers (2022-10-12T09:12:59Z)
Sample Complexity of Robust Reinforcement Learning with a Generative Model [0.0]
We propose a model-based reinforcement learning (RL) algorithm for learning an $epsilon$-optimal robust policy. We consider three different forms of uncertainty sets, characterized by the total variation distance, chi-square divergence, and KL divergence. In addition to the sample complexity results, we also present a formal analytical argument on the benefit of using robust policies.
arXiv Detail & Related papers (2021-12-02T18:55:51Z)
Learning to reflect: A unifying approach for data-driven stochastic control strategies [0.0]
We show that developing efficient strategies for related singular control problems can essentially be reduced to finding rate-optimal estimators. We exploit the exponential $beta$-mixing property as the common factor of both scenarios to drive the convergence analysis.
arXiv Detail & Related papers (2021-04-23T09:33:15Z)
Dynamic Federated Learning [57.14673504239551]
Federated learning has emerged as an umbrella term for centralized coordination strategies in multi-agent environments. We consider a federated learning model where at every iteration, a random subset of available agents perform local updates based on their data. Under a non-stationary random walk model on the true minimizer for the aggregate optimization problem, we establish that the performance of the architecture is determined by three factors, namely, the data variability at each agent, the model variability across all agents, and a tracking term that is inversely proportional to the learning rate of the algorithm.
arXiv Detail & Related papers (2020-02-20T15:00:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.