A Zeroth-Order Momentum Method for Risk-Averse Online Convex Games
- URL: http://arxiv.org/abs/2209.02838v1
- Date: Tue, 6 Sep 2022 22:28:34 GMT
- Title: A Zeroth-Order Momentum Method for Risk-Averse Online Convex Games
- Authors: Zifan Wang, Yi Shen, Zachary I. Bell, Scott Nivison, Michael M.
Zavlanos, Karl H. Johansson
- Abstract summary: We consider risk-averse learning in repeated unknown games where the goal of the agents is to minimize their individual risk of incurring significantly high cost.
A major challenge in using bandit feedback to estimate CVaR is that the agents can only access their own cost values.
We propose a new risk-averse learning algorithm with momentum that utilizes the full historical information on the cost values.
- Score: 15.620924879200793
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider risk-averse learning in repeated unknown games where the goal of
the agents is to minimize their individual risk of incurring significantly high
cost. Specifically, the agents use the conditional value at risk (CVaR) as a
risk measure and rely on bandit feedback in the form of the cost values of the
selected actions at every episode to estimate their CVaR values and update
their actions. A major challenge in using bandit feedback to estimate CVaR is
that the agents can only access their own cost values, which, however, depend
on the actions of all agents. To address this challenge, we propose a new
risk-averse learning algorithm with momentum that utilizes the full historical
information on the cost values. We show that this algorithm achieves sub-linear
regret and matches the best known algorithms in the literature. We provide
numerical experiments for a Cournot game that show that our method outperforms
existing methods.
Related papers
- SafeAR: Safe Algorithmic Recourse by Risk-Aware Policies [2.291948092032746]
We present a method to compute recourse policies that consider variability in cost.
We show how existing recourse desiderata can fail to capture the risk of higher costs.
arXiv Detail & Related papers (2023-08-23T18:12:11Z) - Safe Deployment for Counterfactual Learning to Rank with Exposure-Based
Risk Minimization [63.93275508300137]
We introduce a novel risk-aware Counterfactual Learning To Rank method with theoretical guarantees for safe deployment.
Our experimental results demonstrate the efficacy of our proposed method, which is effective at avoiding initial periods of bad performance when little data is available.
arXiv Detail & Related papers (2023-04-26T15:54:23Z) - Risk-aware linear bandits with convex loss [0.0]
We propose an optimistic UCB algorithm to learn optimal risk-aware actions, with regret guarantees similar to those of generalized linear bandits.
This approach requires solving a convex problem at each round of the algorithm, which we can relax by allowing only approximated solution obtained by online gradient descent.
arXiv Detail & Related papers (2022-09-15T09:09:53Z) - Efficient Risk-Averse Reinforcement Learning [79.61412643761034]
In risk-averse reinforcement learning (RL), the goal is to optimize some risk measure of the returns.
We prove that under certain conditions this inevitably leads to a local-optimum barrier, and propose a soft risk mechanism to bypass it.
We demonstrate improved risk aversion in maze navigation, autonomous driving, and resource allocation benchmarks.
arXiv Detail & Related papers (2022-05-10T19:40:52Z) - Risk-Averse No-Regret Learning in Online Convex Games [19.4481913405231]
We consider an online game with risk-averse agents whose goal is to learn optimal decisions that minimize the risk of incurring significantly high costs.
Since the distributions of the cost functions depend on the actions of all agents that are generally unobservable, the Conditional Value at Risk (CVaR) values of the costs are difficult to compute.
We propose a new online risk-averse learning algorithm that relies on one-point zeroth-order estimation of the CVaR gradients computed using CVaR values.
arXiv Detail & Related papers (2022-03-16T21:36:47Z) - Anti-Concentrated Confidence Bonuses for Scalable Exploration [57.91943847134011]
Intrinsic rewards play a central role in handling the exploration-exploitation trade-off.
We introduce emphanti-concentrated confidence bounds for efficiently approximating the elliptical bonus.
We develop a practical variant for deep reinforcement learning that is competitive with contemporary intrinsic rewards on Atari benchmarks.
arXiv Detail & Related papers (2021-10-21T15:25:15Z) - Deep Reinforcement Learning for Equal Risk Pricing and Hedging under
Dynamic Expectile Risk Measures [1.2891210250935146]
We show that a new off-policy deterministic actor-critic deep reinforcement learning algorithm can identify high quality time consistent hedging policies for options.
Our numerical experiments, which involve both a simple vanilla option and a more exotic basket option, confirm that the new algorithm can produce 1) in simple environments, nearly optimal hedging policies, and highly accurate prices, simultaneously for a range of maturities.
Overall, hedging strategies that actually outperform the strategies produced using static risk measures when the risk is evaluated at later points of time.
arXiv Detail & Related papers (2021-09-09T02:52:06Z) - Online Markov Decision Processes with Aggregate Bandit Feedback [74.85532145498742]
We study a novel variant of online finite-horizon Markov Decision Processes with adversarially changing loss functions and initially unknown dynamics.
In each episode, the learner suffers the loss accumulated along the trajectory realized by the policy chosen for the episode, and observes aggregate bandit feedback.
Our main result is a computationally efficient algorithm with $O(sqrtK)$ regret for this setting, where $K$ is the number of episodes.
arXiv Detail & Related papers (2021-01-31T16:49:07Z) - Risk-Constrained Thompson Sampling for CVaR Bandits [82.47796318548306]
We consider a popular risk measure in quantitative finance known as the Conditional Value at Risk (CVaR)
We explore the performance of a Thompson Sampling-based algorithm CVaR-TS under this risk measure.
arXiv Detail & Related papers (2020-11-16T15:53:22Z) - Risk-Sensitive Reinforcement Learning: Near-Optimal Risk-Sample Tradeoff
in Regret [115.85354306623368]
We study risk-sensitive reinforcement learning in episodic Markov decision processes with unknown transition kernels.
We propose two provably efficient model-free algorithms, Risk-Sensitive Value Iteration (RSVI) and Risk-Sensitive Q-learning (RSQ)
We prove that RSVI attains an $tildeObig(lambda(|beta| H2) cdot sqrtH3 S2AT big)$ regret, while RSQ attains an $tildeObig(lambda
arXiv Detail & Related papers (2020-06-22T19:28:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.