Risk-Averse No-Regret Learning in Online Convex Games
- URL: http://arxiv.org/abs/2203.08957v1
- Date: Wed, 16 Mar 2022 21:36:47 GMT
- Title: Risk-Averse No-Regret Learning in Online Convex Games
- Authors: Zifan Wang, Yi Shen, Michael M. Zavlanos
- Abstract summary: We consider an online game with risk-averse agents whose goal is to learn optimal decisions that minimize the risk of incurring significantly high costs.
Since the distributions of the cost functions depend on the actions of all agents that are generally unobservable, the Conditional Value at Risk (CVaR) values of the costs are difficult to compute.
We propose a new online risk-averse learning algorithm that relies on one-point zeroth-order estimation of the CVaR gradients computed using CVaR values.
- Score: 19.4481913405231
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider an online stochastic game with risk-averse agents whose goal is
to learn optimal decisions that minimize the risk of incurring significantly
high costs. Specifically, we use the Conditional Value at Risk (CVaR) as a risk
measure that the agents can estimate using bandit feedback in the form of the
cost values of only their selected actions. Since the distributions of the cost
functions depend on the actions of all agents that are generally unobservable,
they are themselves unknown and, therefore, the CVaR values of the costs are
difficult to compute. To address this challenge, we propose a new online
risk-averse learning algorithm that relies on one-point zeroth-order estimation
of the CVaR gradients computed using CVaR values that are estimated by
appropriately sampling the cost functions. We show that this algorithm achieves
sub-linear regret with high probability. We also propose two variants of this
algorithm that improve performance. The first variant relies on a new sampling
strategy that uses samples from the previous iteration to improve the
estimation accuracy of the CVaR values. The second variant employs residual
feedback that uses CVaR values from the previous iteration to reduce the
variance of the CVaR gradient estimates. We theoretically analyze the
convergence properties of these variants and illustrate their performance on an
online market problem that we model as a Cournot game.
Related papers
- Risk-averse Learning with Non-Stationary Distributions [18.15046585146849]
In this paper, we investigate risk-averse online optimization where the distribution of the random cost changes over time.
We minimize risk-averse objective function using the Conditional Value at Risk (CVaR) as risk measure.
We show that our designed learning algorithm achieves sub-linear dynamic regret with high probability for both convex and strongly convex functions.
arXiv Detail & Related papers (2024-04-03T18:16:47Z) - Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning.
We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values.
We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z) - Provably Efficient CVaR RL in Low-rank MDPs [58.58570425202862]
We study risk-sensitive Reinforcement Learning (RL)
We propose a novel Upper Confidence Bound (UCB) bonus-driven algorithm to balance interplay between exploration, exploitation, and representation learning in CVaR RL.
We prove that our algorithm achieves a sample complexity of $epsilon$-optimal CVaR, where $H$ is the length of each episode, $A$ is the capacity of action space, and $d$ is the dimension of representations.
arXiv Detail & Related papers (2023-11-20T17:44:40Z) - Forward-PECVaR Algorithm: Exact Evaluation for CVaR SSPs [1.347733333991357]
Conditional Value at Risk (CVaR) is a criterion that allows modeling an arbitrary level of risk.
We propose a new algorithm, Forward-PECVaR, that evaluates exactly stationary policies of CVaR-SSPs with non-uniform costs.
arXiv Detail & Related papers (2023-03-01T17:10:22Z) - A Zeroth-Order Momentum Method for Risk-Averse Online Convex Games [15.620924879200793]
We consider risk-averse learning in repeated unknown games where the goal of the agents is to minimize their individual risk of incurring significantly high cost.
A major challenge in using bandit feedback to estimate CVaR is that the agents can only access their own cost values.
We propose a new risk-averse learning algorithm with momentum that utilizes the full historical information on the cost values.
arXiv Detail & Related papers (2022-09-06T22:28:34Z) - Efficient Risk-Averse Reinforcement Learning [79.61412643761034]
In risk-averse reinforcement learning (RL), the goal is to optimize some risk measure of the returns.
We prove that under certain conditions this inevitably leads to a local-optimum barrier, and propose a soft risk mechanism to bypass it.
We demonstrate improved risk aversion in maze navigation, autonomous driving, and resource allocation benchmarks.
arXiv Detail & Related papers (2022-05-10T19:40:52Z) - Bias-Corrected Peaks-Over-Threshold Estimation of the CVaR [2.552459629685159]
Conditional value-at-risk (CVaR) is a useful risk measure in fields such as machine learning, finance, insurance, energy, etc.
When measuring very extreme risk, the commonly used CVaR estimation method of sample averaging does not work well.
To mitigate this problem, the CVaR can be estimated by extrapolating above a lower threshold than the VaR.
arXiv Detail & Related papers (2021-03-08T20:29:06Z) - Variance Penalized On-Policy and Off-Policy Actor-Critic [60.06593931848165]
We propose on-policy and off-policy actor-critic algorithms that optimize a performance criterion involving both mean and variance in the return.
Our approach not only performs on par with actor-critic and prior variance-penalization baselines in terms of expected return, but also generates trajectories which have lower variance in the return.
arXiv Detail & Related papers (2021-02-03T10:06:16Z) - Risk-Constrained Thompson Sampling for CVaR Bandits [82.47796318548306]
We consider a popular risk measure in quantitative finance known as the Conditional Value at Risk (CVaR)
We explore the performance of a Thompson Sampling-based algorithm CVaR-TS under this risk measure.
arXiv Detail & Related papers (2020-11-16T15:53:22Z) - Large-Scale Methods for Distributionally Robust Optimization [53.98643772533416]
We prove that our algorithms require a number of evaluations gradient independent of training set size and number of parameters.
Experiments on MNIST and ImageNet confirm the theoretical scaling of our algorithms, which are 9--36 times more efficient than full-batch methods.
arXiv Detail & Related papers (2020-10-12T17:41:44Z) - Statistical Learning with Conditional Value at Risk [35.4968603057034]
We propose a risk-averse statistical learning framework wherein the performance of a learning algorithm is evaluated by the conditional value-at-risk (CVaR) of losses rather than the expected loss.
arXiv Detail & Related papers (2020-02-14T00:58:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.