Related papers: Exploring the Training Robustness of Distributional Reinforcement Learning against Noisy State Observations

Exploring the Training Robustness of Distributional Reinforcement Learning against Noisy State Observations

URL: http://arxiv.org/abs/2109.08776v5
Date: Wed, 21 Jun 2023 23:34:18 GMT
Title: Exploring the Training Robustness of Distributional Reinforcement Learning against Noisy State Observations
Authors: Ke Sun, Yingnan Zhao, Shangling Jui, Linglong Kong
Abstract summary: State observations that an agent observes may contain measurement errors or adversarial noises, misleading the agent to take suboptimal actions or even collapse while training. In this paper, we study the training robustness of distributional Reinforcement Learning (RL), a class of state-of-the-art methods that estimate the whole distribution, as opposed to only the expectation, of the total return.
Score: 7.776010676090131
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In real scenarios, state observations that an agent observes may contain measurement errors or adversarial noises, misleading the agent to take suboptimal actions or even collapse while training. In this paper, we study the training robustness of distributional Reinforcement Learning (RL), a class of state-of-the-art methods that estimate the whole distribution, as opposed to only the expectation, of the total return. Firstly, we validate the contraction of distributional Bellman operators in the State-Noisy Markov Decision Process (SN-MDP), a typical tabular case that incorporates both random and adversarial state observation noises. In the noisy setting with function approximation, we then analyze the vulnerability of least squared loss in expectation-based RL with either linear or nonlinear function approximation. By contrast, we theoretically characterize the bounded gradient norm of distributional RL loss based on the categorical parameterization equipped with the KL divergence. The resulting stable gradients while the optimization in distributional RL accounts for its better training robustness against state observation noises. Finally, extensive experiments on the suite of environments verified that distributional RL is less vulnerable against both random and adversarial noisy state observations compared with its expectation-based counterpart.

Related papers

TULiP: Test-time Uncertainty Estimation via Linearization and Weight Perturbation [11.334867025651233]
We propose TULiP, a theoretically-driven uncertainty estimator for OOD detection.<n>Our approach considers a hypothetical perturbation applied to the network before convergence.<n>Our method exhibits state-of-the-art performance, particularly for near-distribution samples.
arXiv Detail & Related papers (2025-05-22T17:16:41Z)
The Power of Perturbation under Sampling in Solving Extensive-Form Games [56.013335390600524]
This paper investigates how perturbation does and does not improve the Follow-the-Regularized-Leader (FTRL) algorithm in imperfect-information extensive-form games. Perturbing the expected payoffs guarantees that the FTRL dynamics reach an approximate equilibrium. We show that in the last-iterate sense, the FTRL consistently outperforms the non-samplinged FTRL.
arXiv Detail & Related papers (2025-01-28T00:29:38Z)
Assessing the Impact of Distribution Shift on Reinforcement Learning Performance [0.0]
Reinforcement learning (RL) faces its own set of unique challenges. Comparison of point estimates, and plots that show successful convergence to the optimal policy during training, may obfuscate overfitting or dependence on the experimental setup. We propose a set of evaluation methods that measure the robustness of RL algorithms under distribution shifts.
arXiv Detail & Related papers (2024-02-05T23:50:55Z)
May the Noise be with you: Adversarial Training without Adversarial Examples [3.4673556247932225]
We investigate the question: Can we obtain adversarially-trained models without training on adversarial? Our proposed approach incorporates inherentity by embedding Gaussian noise within the layers of the NN model at training time. Our work contributes adversarially trained networks using a completely different approach, with empirically similar robustness to adversarial training.
arXiv Detail & Related papers (2023-12-12T08:22:28Z)
Provable Guarantees for Generative Behavior Cloning: Bridging Low-Level Stability and High-Level Behavior [51.60683890503293]
We propose a theoretical framework for studying behavior cloning of complex expert demonstrations using generative modeling. We show that pure supervised cloning can generate trajectories matching the per-time step distribution of arbitrary expert trajectories.
arXiv Detail & Related papers (2023-07-27T04:27:26Z)
Causality-oriented robustness: exploiting general noise interventions [4.64479351797195]
In this paper, we focus on causality-oriented robustness and propose Distributional Robustness via Invariant Gradients (DRIG) DRIG exploits general noise interventions in training data for robust predictions against unseen interventions. We show that our framework includes anchor regression as a special case, and that it yields prediction models that protect against more diverse perturbations.
arXiv Detail & Related papers (2023-07-18T16:22:50Z)
Adversarial robustness of amortized Bayesian inference [3.308743964406687]
Amortized Bayesian inference is to initially invest computational cost in training an inference network on simulated data. We show that almost unrecognizable, targeted perturbations of the observations can lead to drastic changes in the predicted posterior and highly unrealistic posterior predictive samples. We propose a computationally efficient regularization scheme based on penalizing the Fisher information of the conditional density estimator.
arXiv Detail & Related papers (2023-05-24T10:18:45Z)
Stochastic optimal well control in subsurface reservoirs using reinforcement learning [0.0]
We present a case study of model-free reinforcement learning framework to solve optimal control for a predefined parameter uncertainty distribution. In principle, RL algorithms are capable of learning optimal action policies to maximize a numerical reward signal. We present numerical results using two state-of-the-art RL algorithms, proximal policy optimization (PPO) and advantage actor-critic (A2C) on two subsurface flow test cases.
arXiv Detail & Related papers (2022-07-07T17:34:23Z)
Optimal variance-reduced stochastic approximation in Banach spaces [114.8734960258221]
We study the problem of estimating the fixed point of a contractive operator defined on a separable Banach space. We establish non-asymptotic bounds for both the operator defect and the estimation error.
arXiv Detail & Related papers (2022-01-21T02:46:57Z)
Sample Efficient Deep Reinforcement Learning via Uncertainty Estimation [12.415463205960156]
In model-free deep reinforcement learning (RL) algorithms, using noisy value estimates to supervise policy evaluation and optimization is detrimental to the sample efficiency. We provide a systematic analysis of the sources of uncertainty in the noisy supervision that occurs in RL. We propose a method whereby two complementary uncertainty estimation methods account for both the Q-value and the environmentity to better mitigate the negative impacts of noisy supervision.
arXiv Detail & Related papers (2022-01-05T15:46:06Z)
Regularization Guarantees Generalization in Bayesian Reinforcement Learning through Algorithmic Stability [48.62272919754204]
We study generalization in Bayesian RL under the probably approximately correct (PAC) framework. Our main contribution is showing that by adding regularization, the optimal policy becomes stable in an appropriate sense.
arXiv Detail & Related papers (2021-09-24T07:48:34Z)
Counterfactual Maximum Likelihood Estimation for Training Deep Networks [83.44219640437657]
Deep learning models are prone to learning spurious correlations that should not be learned as predictive clues. We propose a causality-based training framework to reduce the spurious correlations caused by observable confounders. We conduct experiments on two real-world tasks: Natural Language Inference (NLI) and Image Captioning.
arXiv Detail & Related papers (2021-06-07T17:47:16Z)
Evaluating probabilistic classifiers: Reliability diagrams and score decompositions revisited [68.8204255655161]
We introduce the CORP approach, which generates provably statistically Consistent, Optimally binned, and Reproducible reliability diagrams in an automated way. Corpor is based on non-parametric isotonic regression and implemented via the Pool-adjacent-violators (PAV) algorithm.
arXiv Detail & Related papers (2020-08-07T08:22:26Z)
Distributional Reinforcement Learning via Moment Matching [54.16108052278444]
We formulate a method that learns a finite set of statistics from each return distribution via neural networks. Our method can be interpreted as implicitly matching all orders of moments between a return distribution and its Bellman target. Experiments on the suite of Atari games show that our method outperforms the standard distributional RL baselines.
arXiv Detail & Related papers (2020-07-24T05:18:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.