Risk Preferences of Learning Algorithms
- URL: http://arxiv.org/abs/2205.04619v3
- Date: Tue, 12 Dec 2023 16:43:44 GMT
- Title: Risk Preferences of Learning Algorithms
- Authors: Andreas Haupt and Aroon Narayanan
- Abstract summary: We show that a widely used learning algorithm, $varepsilon$-Greedy, exhibits emergent risk aversion.
We discuss two methods to correct this bias.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Agents' learning from feedback shapes economic outcomes, and many economic
decision-makers today employ learning algorithms to make consequential choices.
This note shows that a widely used learning algorithm, $\varepsilon$-Greedy,
exhibits emergent risk aversion: it prefers actions with lower variance. When
presented with actions of the same expectation, under a wide range of
conditions, $\varepsilon$-Greedy chooses the lower-variance action with
probability approaching one. This emergent preference can have wide-ranging
consequences, ranging from concerns about fairness to homogenization, and holds
transiently even when the riskier action has a strictly higher expected payoff.
We discuss two methods to correct this bias. The first method requires the
algorithm to reweight data as a function of how likely the actions were to be
chosen. The second requires the algorithm to have optimistic estimates of
actions for which it has not collected much data. We show that risk-neutrality
is restored with these corrections.
Related papers
- Mind the Gap: A Causal Perspective on Bias Amplification in Prediction & Decision-Making [58.06306331390586]
We introduce the notion of a margin complement, which measures how much a prediction score $S$ changes due to a thresholding operation.
We show that under suitable causal assumptions, the influences of $X$ on the prediction score $S$ are equal to the influences of $X$ on the true outcome $Y$.
arXiv Detail & Related papers (2024-05-24T11:22:19Z) - Constrained Online Two-stage Stochastic Optimization: Algorithm with
(and without) Predictions [19.537289123577022]
We consider an online two-stage optimization with long-term constraints over a finite horizon of $T$ periods.
We develop online algorithms for the online two-stage problem from adversarial learning algorithms.
arXiv Detail & Related papers (2024-01-02T07:46:33Z) - Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits [53.281230333364505]
This paper studies the problem of contextual dueling bandits, where the binary comparison of dueling arms is generated from a generalized linear model (GLM)
We propose a new SupLinUCB-type algorithm that enjoys computational efficiency and a variance-aware regret bound $tilde Obig(dsqrtsum_t=1Tsigma_t2 + dbig)$.
Our regret bound naturally aligns with the intuitive expectation in scenarios where the comparison is deterministic, the algorithm only suffers from an $tilde O(d)$ regret.
arXiv Detail & Related papers (2023-10-02T08:15:52Z) - Fundamental Bounds on Online Strategic Classification [13.442155854812528]
We show that no deterministic algorithm can achieve a mistake bound $o(Delta)$ in the strategic setting.
We also extend this to the agnostic setting and obtain an algorithm with a $Delta$ multiplicative regret.
We design randomized algorithms that achieve sublinear regret bounds against both oblivious and adaptive adversaries.
arXiv Detail & Related papers (2023-02-23T22:39:43Z) - Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement
Learning: Adaptivity and Computational Efficiency [90.40062452292091]
We present the first computationally efficient algorithm for linear bandits with heteroscedastic noise.
Our algorithm is adaptive to the unknown variance of noise and achieves an $tildeO(d sqrtsum_k = 1K sigma_k2 + d)$ regret.
We also propose a variance-adaptive algorithm for linear mixture Markov decision processes (MDPs) in reinforcement learning.
arXiv Detail & Related papers (2023-02-21T00:17:24Z) - Autoregressive Bandits [58.46584210388307]
We propose a novel online learning setting, Autoregressive Bandits, in which the observed reward is governed by an autoregressive process of order $k$.
We show that, under mild assumptions on the reward process, the optimal policy can be conveniently computed.
We then devise a new optimistic regret minimization algorithm, namely, AutoRegressive Upper Confidence Bound (AR-UCB), that suffers sublinear regret of order $widetildemathcalO left( frac(k+1)3/2sqrtnT (1-G
arXiv Detail & Related papers (2022-12-12T21:37:36Z) - Risk-aware linear bandits with convex loss [0.0]
We propose an optimistic UCB algorithm to learn optimal risk-aware actions, with regret guarantees similar to those of generalized linear bandits.
This approach requires solving a convex problem at each round of the algorithm, which we can relax by allowing only approximated solution obtained by online gradient descent.
arXiv Detail & Related papers (2022-09-15T09:09:53Z) - Navigating to the Best Policy in Markov Decision Processes [68.8204255655161]
We investigate the active pure exploration problem in Markov Decision Processes.
Agent sequentially selects actions and, from the resulting system trajectory, aims at the best as fast as possible.
arXiv Detail & Related papers (2021-06-05T09:16:28Z) - Continuous Mean-Covariance Bandits [39.820490484375156]
We propose a novel Continuous Mean-Covariance Bandit model to take into account option correlation.
In CMCB, there is a learner who sequentially chooses weight vectors on given options and observes random feedback according to the decisions.
We propose novel algorithms with optimal regrets (within logarithmic factors) and provide matching lower bounds to validate their optimalities.
arXiv Detail & Related papers (2021-02-24T06:37:05Z) - Risk-Sensitive Reinforcement Learning: Near-Optimal Risk-Sample Tradeoff
in Regret [115.85354306623368]
We study risk-sensitive reinforcement learning in episodic Markov decision processes with unknown transition kernels.
We propose two provably efficient model-free algorithms, Risk-Sensitive Value Iteration (RSVI) and Risk-Sensitive Q-learning (RSQ)
We prove that RSVI attains an $tildeObig(lambda(|beta| H2) cdot sqrtH3 S2AT big)$ regret, while RSQ attains an $tildeObig(lambda
arXiv Detail & Related papers (2020-06-22T19:28:26Z) - Debiased Off-Policy Evaluation for Recommendation Systems [8.63711086812655]
A/B tests are reliable, but are time- and money-consuming, and entail a risk of failure.
We develop an alternative method, which predicts the performance of algorithms given historical data.
Our method produces smaller mean squared errors than state-of-the-art methods.
arXiv Detail & Related papers (2020-02-20T02:30:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.