Related papers: Bayesian Risk-Averse Q-Learning with Streaming Observations

Bayesian Risk-Averse Q-Learning with Streaming Observations

URL: http://arxiv.org/abs/2305.11300v1
Date: Thu, 18 May 2023 20:48:50 GMT
Title: Bayesian Risk-Averse Q-Learning with Streaming Observations
Authors: Yuhao Wang, Enlu Zhou
Abstract summary: We consider a robust reinforcement learning problem, where a learning agent learns from a simulated training environment. Observations from the real environment that is out of the agent's control arrive periodically. We develop a multi-stage Bayesian risk-averse Q-learning algorithm to solve BRMDP with streaming observations from real environment.
Score: 7.330349128557128
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider a robust reinforcement learning problem, where a learning agent learns from a simulated training environment. To account for the model mis-specification between this training environment and the real environment due to lack of data, we adopt a formulation of Bayesian risk MDP (BRMDP) with infinite horizon, which uses Bayesian posterior to estimate the transition model and impose a risk functional to account for the model uncertainty. Observations from the real environment that is out of the agent's control arrive periodically and are utilized by the agent to update the Bayesian posterior to reduce model uncertainty. We theoretically demonstrate that BRMDP balances the trade-off between robustness and conservativeness, and we further develop a multi-stage Bayesian risk-averse Q-learning algorithm to solve BRMDP with streaming observations from real environment. The proposed algorithm learns a risk-averse yet optimal policy that depends on the availability of real-world observations. We provide a theoretical guarantee of strong convergence for the proposed algorithm.

Related papers

Uncertainty-Aware Decoding with Minimum Bayes Risk [70.6645260214115]
We show how Minimum Bayes Risk decoding, which selects model generations according to an expected risk, can be generalized into a principled uncertainty-aware decoding method. We show that this modified expected risk is useful for both choosing outputs and deciding when to abstain from generation and can provide improvements without incurring overhead.
arXiv Detail & Related papers (2025-03-07T10:55:12Z)
Risk-Averse Certification of Bayesian Neural Networks [70.44969603471903]
We propose a Risk-Averse Certification framework for Bayesian neural networks called RAC-BNN. Our method leverages sampling and optimisation to compute a sound approximation of the output set of a BNN. We validate RAC-BNN on a range of regression and classification benchmarks and compare its performance with a state-of-the-art method.
arXiv Detail & Related papers (2024-11-29T14:22:51Z)
Robust Reinforcement Learning with Dynamic Distortion Risk Measures [0.0]
We devise a framework to solve robust risk-aware reinforcement learning problems. We simultaneously account for environmental uncertainty and risk with a class of dynamic robust distortion risk measures. We construct an actor-critic algorithm to solve this class of robust risk-aware RL problems.
arXiv Detail & Related papers (2024-09-16T08:54:59Z)
Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning. We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values. We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z)
A Bayesian Approach to Robust Inverse Reinforcement Learning [54.24816623644148]
We consider a Bayesian approach to offline model-based inverse reinforcement learning (IRL) The proposed framework differs from existing offline model-based IRL approaches by performing simultaneous estimation of the expert's reward function and subjective model of environment dynamics. Our analysis reveals a novel insight that the estimated policy exhibits robust performance when the expert is believed to have a highly accurate model of the environment.
arXiv Detail & Related papers (2023-09-15T17:37:09Z)
Mind the Uncertainty: Risk-Aware and Actively Exploring Model-Based Reinforcement Learning [26.497229327357935]
We introduce a simple but effective method for managing risk in model-based reinforcement learning with trajectory sampling. Experiments indicate that the separation of uncertainties is essential to performing well with data-driven approaches in uncertain and safety-critical control environments.
arXiv Detail & Related papers (2023-09-11T16:10:58Z)
Model-Assisted Probabilistic Safe Adaptive Control With Meta-Bayesian Learning [33.75998206184497]
We develop a novel adaptive safe control framework that integrates meta learning, Bayesian models, and control barrier function (CBF) method. Specifically, with the help of CBF method, we learn the inherent and external uncertainties by a unified adaptive Bayesian linear regression model. For a new control task, we refine the meta-learned models using a few samples, and introduce pessimistic confidence bounds into CBF constraints to ensure safe control.
arXiv Detail & Related papers (2023-07-03T08:16:01Z)
Efficient Model-based Multi-agent Reinforcement Learning via Optimistic Equilibrium Computation [93.52573037053449]
H-MARL (Hallucinated Multi-Agent Reinforcement Learning) learns successful equilibrium policies after a few interactions with the environment. We demonstrate our approach experimentally on an autonomous driving simulation benchmark.
arXiv Detail & Related papers (2022-03-14T17:24:03Z)
Policy Learning for Robust Markov Decision Process with a Mismatched Generative Model [42.28001762749647]
In high-stake scenarios like medical treatment and auto-piloting, it's risky or even infeasible to collect online experimental data to train the agent. We consider policy learning for Robust Markov Decision Processes (RMDP), where the agent tries to seek a robust policy with respect to unexpected perturbations on the environments. Our goal is to identify a near-optimal robust policy for the perturbed testing environment, which introduces additional technical difficulties.
arXiv Detail & Related papers (2022-03-13T06:37:25Z)
Bayesian Bellman Operators [55.959376449737405]
We introduce a novel perspective on Bayesian reinforcement learning (RL) Our framework is motivated by the insight that when bootstrapping is introduced, model-free approaches actually infer a posterior over Bellman operators, not value functions.
arXiv Detail & Related papers (2021-06-09T12:20:46Z)
Risk-Averse Bayes-Adaptive Reinforcement Learning [3.5289688061934963]
We pose the problem of optimising the conditional value at risk (CVaR) of the total return in Bayes-adaptive Markov decision processes (MDPs) We show that a policy optimising CVaR in this setting is risk-averse to both the parametric uncertainty due to the prior distribution over MDPs, and the internal uncertainty due to the inherentity of MDPs. Our experiments demonstrate that our approach significantly outperforms baseline approaches for this problem.
arXiv Detail & Related papers (2021-02-10T22:34:33Z)
Learning Interpretable Deep State Space Model for Probabilistic Time Series Forecasting [98.57851612518758]
Probabilistic time series forecasting involves estimating the distribution of future based on its history. We propose a deep state space model for probabilistic time series forecasting whereby the non-linear emission model and transition model are parameterized by networks. We show in experiments that our model produces accurate and sharp probabilistic forecasts.
arXiv Detail & Related papers (2021-01-31T06:49:33Z)
Trust but Verify: Assigning Prediction Credibility by Counterfactual Constrained Learning [123.3472310767721]
Prediction credibility measures are fundamental in statistics and machine learning. These measures should account for the wide variety of models used in practice. The framework developed in this work expresses the credibility as a risk-fit trade-off.
arXiv Detail & Related papers (2020-11-24T19:52:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.