Exploring the Training Robustness of Distributional Reinforcement
Learning against Noisy State Observations
- URL: http://arxiv.org/abs/2109.08776v5
- Date: Wed, 21 Jun 2023 23:34:18 GMT
- Title: Exploring the Training Robustness of Distributional Reinforcement
Learning against Noisy State Observations
- Authors: Ke Sun, Yingnan Zhao, Shangling Jui, Linglong Kong
- Abstract summary: State observations that an agent observes may contain measurement errors or adversarial noises, misleading the agent to take suboptimal actions or even collapse while training.
In this paper, we study the training robustness of distributional Reinforcement Learning (RL), a class of state-of-the-art methods that estimate the whole distribution, as opposed to only the expectation, of the total return.
- Score: 7.776010676090131
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In real scenarios, state observations that an agent observes may contain
measurement errors or adversarial noises, misleading the agent to take
suboptimal actions or even collapse while training. In this paper, we study the
training robustness of distributional Reinforcement Learning (RL), a class of
state-of-the-art methods that estimate the whole distribution, as opposed to
only the expectation, of the total return. Firstly, we validate the contraction
of distributional Bellman operators in the State-Noisy Markov Decision Process
(SN-MDP), a typical tabular case that incorporates both random and adversarial
state observation noises. In the noisy setting with function approximation, we
then analyze the vulnerability of least squared loss in expectation-based RL
with either linear or nonlinear function approximation. By contrast, we
theoretically characterize the bounded gradient norm of distributional RL loss
based on the categorical parameterization equipped with the KL divergence. The
resulting stable gradients while the optimization in distributional RL accounts
for its better training robustness against state observation noises. Finally,
extensive experiments on the suite of environments verified that distributional
RL is less vulnerable against both random and adversarial noisy state
observations compared with its expectation-based counterpart.
Related papers
- Assessing the Impact of Distribution Shift on Reinforcement Learning
Performance [0.0]
Reinforcement learning (RL) faces its own set of unique challenges.
Comparison of point estimates, and plots that show successful convergence to the optimal policy during training, may obfuscate overfitting or dependence on the experimental setup.
We propose a set of evaluation methods that measure the robustness of RL algorithms under distribution shifts.
arXiv Detail & Related papers (2024-02-05T23:50:55Z) - May the Noise be with you: Adversarial Training without Adversarial
Examples [3.4673556247932225]
We investigate the question: Can we obtain adversarially-trained models without training on adversarial?
Our proposed approach incorporates inherentity by embedding Gaussian noise within the layers of the NN model at training time.
Our work contributes adversarially trained networks using a completely different approach, with empirically similar robustness to adversarial training.
arXiv Detail & Related papers (2023-12-12T08:22:28Z) - Provable Guarantees for Generative Behavior Cloning: Bridging Low-Level
Stability and High-Level Behavior [51.60683890503293]
We propose a theoretical framework for studying behavior cloning of complex expert demonstrations using generative modeling.
We show that pure supervised cloning can generate trajectories matching the per-time step distribution of arbitrary expert trajectories.
arXiv Detail & Related papers (2023-07-27T04:27:26Z) - Adversarial robustness of amortized Bayesian inference [3.308743964406687]
Amortized Bayesian inference is to initially invest computational cost in training an inference network on simulated data.
We show that almost unrecognizable, targeted perturbations of the observations can lead to drastic changes in the predicted posterior and highly unrealistic posterior predictive samples.
We propose a computationally efficient regularization scheme based on penalizing the Fisher information of the conditional density estimator.
arXiv Detail & Related papers (2023-05-24T10:18:45Z) - Stochastic optimal well control in subsurface reservoirs using
reinforcement learning [0.0]
We present a case study of model-free reinforcement learning framework to solve optimal control for a predefined parameter uncertainty distribution.
In principle, RL algorithms are capable of learning optimal action policies to maximize a numerical reward signal.
We present numerical results using two state-of-the-art RL algorithms, proximal policy optimization (PPO) and advantage actor-critic (A2C) on two subsurface flow test cases.
arXiv Detail & Related papers (2022-07-07T17:34:23Z) - Optimal variance-reduced stochastic approximation in Banach spaces [114.8734960258221]
We study the problem of estimating the fixed point of a contractive operator defined on a separable Banach space.
We establish non-asymptotic bounds for both the operator defect and the estimation error.
arXiv Detail & Related papers (2022-01-21T02:46:57Z) - Sample Efficient Deep Reinforcement Learning via Uncertainty Estimation [12.415463205960156]
In model-free deep reinforcement learning (RL) algorithms, using noisy value estimates to supervise policy evaluation and optimization is detrimental to the sample efficiency.
We provide a systematic analysis of the sources of uncertainty in the noisy supervision that occurs in RL.
We propose a method whereby two complementary uncertainty estimation methods account for both the Q-value and the environmentity to better mitigate the negative impacts of noisy supervision.
arXiv Detail & Related papers (2022-01-05T15:46:06Z) - Regularization Guarantees Generalization in Bayesian Reinforcement
Learning through Algorithmic Stability [48.62272919754204]
We study generalization in Bayesian RL under the probably approximately correct (PAC) framework.
Our main contribution is showing that by adding regularization, the optimal policy becomes stable in an appropriate sense.
arXiv Detail & Related papers (2021-09-24T07:48:34Z) - Counterfactual Maximum Likelihood Estimation for Training Deep Networks [83.44219640437657]
Deep learning models are prone to learning spurious correlations that should not be learned as predictive clues.
We propose a causality-based training framework to reduce the spurious correlations caused by observable confounders.
We conduct experiments on two real-world tasks: Natural Language Inference (NLI) and Image Captioning.
arXiv Detail & Related papers (2021-06-07T17:47:16Z) - Evaluating probabilistic classifiers: Reliability diagrams and score
decompositions revisited [68.8204255655161]
We introduce the CORP approach, which generates provably statistically Consistent, Optimally binned, and Reproducible reliability diagrams in an automated way.
Corpor is based on non-parametric isotonic regression and implemented via the Pool-adjacent-violators (PAV) algorithm.
arXiv Detail & Related papers (2020-08-07T08:22:26Z) - Distributional Reinforcement Learning via Moment Matching [54.16108052278444]
We formulate a method that learns a finite set of statistics from each return distribution via neural networks.
Our method can be interpreted as implicitly matching all orders of moments between a return distribution and its Bellman target.
Experiments on the suite of Atari games show that our method outperforms the standard distributional RL baselines.
arXiv Detail & Related papers (2020-07-24T05:18:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.