Statistical Learning of Distributionally Robust Stochastic Control in Continuous State Spaces
- URL: http://arxiv.org/abs/2406.11281v1
- Date: Mon, 17 Jun 2024 07:37:36 GMT
- Title: Statistical Learning of Distributionally Robust Stochastic Control in Continuous State Spaces
- Authors: Shengbo Wang, Nian Si, Jose Blanchet, Zhengyuan Zhou,
- Abstract summary: We explore the control of systems with potentially continuous state and action spaces, characterized by the state dynamics $X_t+1 = f(X_t, A_t, W_t)$.
Here, $X$, $A$, and $W$ represent the state, action, and random noise processes, respectively, with $f$ denoting a known function that describes state transitions.
This paper introduces a distributionally robust control paradigm that accommodates possibly adversarial perturbation to the noise distribution within a prescribed ambiguity set.
- Score: 17.96094201655567
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We explore the control of stochastic systems with potentially continuous state and action spaces, characterized by the state dynamics $X_{t+1} = f(X_t, A_t, W_t)$. Here, $X$, $A$, and $W$ represent the state, action, and exogenous random noise processes, respectively, with $f$ denoting a known function that describes state transitions. Traditionally, the noise process $\{W_t, t \geq 0\}$ is assumed to be independent and identically distributed, with a distribution that is either fully known or can be consistently estimated. However, the occurrence of distributional shifts, typical in engineering settings, necessitates the consideration of the robustness of the policy. This paper introduces a distributionally robust stochastic control paradigm that accommodates possibly adaptive adversarial perturbation to the noise distribution within a prescribed ambiguity set. We examine two adversary models: current-action-aware and current-action-unaware, leading to different dynamic programming equations. Furthermore, we characterize the optimal finite sample minimax rates for achieving uniform learning of the robust value function across continuum states under both adversary types, considering ambiguity sets defined by $f_k$-divergence and Wasserstein distance. Finally, we demonstrate the applicability of our framework across various real-world settings.
Related papers
- Provable Guarantees for Generative Behavior Cloning: Bridging Low-Level
Stability and High-Level Behavior [51.60683890503293]
We propose a theoretical framework for studying behavior cloning of complex expert demonstrations using generative modeling.
We show that pure supervised cloning can generate trajectories matching the per-time step distribution of arbitrary expert trajectories.
arXiv Detail & Related papers (2023-07-27T04:27:26Z) - $K$-Nearest-Neighbor Resampling for Off-Policy Evaluation in Stochastic
Control [0.6906005491572401]
We propose a novel $K$-nearest neighbor reparametric procedure for estimating the performance of a policy from historical data.
Our analysis allows for the sampling of entire episodes, as is common practice in most applications.
Compared to other OPE methods, our algorithm does not require optimization, can be efficiently implemented via tree-based nearest neighbor search and parallelization, and does not explicitly assume a parametric model for the environment's dynamics.
arXiv Detail & Related papers (2023-06-07T23:55:12Z) - Robust Control for Dynamical Systems With Non-Gaussian Noise via Formal
Abstractions [59.605246463200736]
We present a novel controller synthesis method that does not rely on any explicit representation of the noise distributions.
First, we abstract the continuous control system into a finite-state model that captures noise by probabilistic transitions between discrete states.
We use state-of-the-art verification techniques to provide guarantees on the interval Markov decision process and compute a controller for which these guarantees carry over to the original control system.
arXiv Detail & Related papers (2023-01-04T10:40:30Z) - Score-based Continuous-time Discrete Diffusion Models [102.65769839899315]
We extend diffusion models to discrete variables by introducing a Markov jump process where the reverse process denoises via a continuous-time Markov chain.
We show that an unbiased estimator can be obtained via simple matching the conditional marginal distributions.
We demonstrate the effectiveness of the proposed method on a set of synthetic and real-world music and image benchmarks.
arXiv Detail & Related papers (2022-11-30T05:33:29Z) - Dynamic mean field programming [1.2183405753834562]
A dynamic mean field theory is developed for finite state and action reinforcement learning in the large state space limit.
Under certain assumptions, the state-action values are statistically independent across state-action pairs in the state space limit.
The results hold in the finite and discounted infinite horizon settings, for both value iteration and policy evaluation.
arXiv Detail & Related papers (2022-06-10T15:57:23Z) - Sampling-Based Robust Control of Autonomous Systems with Non-Gaussian
Noise [59.47042225257565]
We present a novel planning method that does not rely on any explicit representation of the noise distributions.
First, we abstract the continuous system into a discrete-state model that captures noise by probabilistic transitions between states.
We capture these bounds in the transition probability intervals of a so-called interval Markov decision process (iMDP)
arXiv Detail & Related papers (2021-10-25T06:18:55Z) - Robust Value Iteration for Continuous Control Tasks [99.00362538261972]
When transferring a control policy from simulation to a physical system, the policy needs to be robust to variations in the dynamics to perform well.
We present Robust Fitted Value Iteration, which uses dynamic programming to compute the optimal value function on the compact state domain.
We show that robust value is more robust compared to deep reinforcement learning algorithm and the non-robust version of the algorithm.
arXiv Detail & Related papers (2021-05-25T19:48:35Z) - $(f,\Gamma)$-Divergences: Interpolating between $f$-Divergences and
Integral Probability Metrics [6.221019624345409]
We develop a framework for constructing information-theoretic divergences that subsume both $f$-divergences and integral probability metrics (IPMs)
We show that they can be expressed as a two-stage mass-redistribution/mass-transport process.
Using statistical learning as an example, we demonstrate their advantage in training generative adversarial networks (GANs) for heavy-tailed, not-absolutely continuous sample distributions.
arXiv Detail & Related papers (2020-11-11T18:17:09Z) - Implicit Distributional Reinforcement Learning [61.166030238490634]
implicit distributional actor-critic (IDAC) built on two deep generator networks (DGNs)
Semi-implicit actor (SIA) powered by a flexible policy distribution.
We observe IDAC outperforms state-of-the-art algorithms on representative OpenAI Gym environments.
arXiv Detail & Related papers (2020-07-13T02:52:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.