Predict to Minimize Swap Regret for All Payoff-Bounded Tasks
- URL: http://arxiv.org/abs/2404.13503v2
- Date: Wed, 24 Apr 2024 20:29:20 GMT
- Title: Predict to Minimize Swap Regret for All Payoff-Bounded Tasks
- Authors: Lunjia Hu, Yifan Wu,
- Abstract summary: We study the Maximum Swap Regret (MSR) of predictions for binary events.
We give an efficient randomized prediction algorithm that guarantees $O(sqrtTlogT)$ expected MSR.
- Score: 15.793486463552144
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A sequence of predictions is calibrated if and only if it induces no swap regret to all down-stream decision tasks. We study the Maximum Swap Regret (MSR) of predictions for binary events: the swap regret maximized over all downstream tasks with bounded payoffs. Previously, the best online prediction algorithm for minimizing MSR is obtained by minimizing the K1 calibration error, which upper bounds MSR up to a constant factor. However, recent work (Qiao and Valiant, 2021) gives an ${\Omega}(T^{0.528})$ lower bound for the worst-case expected $K_1$ calibration error incurred by any randomized algorithm in T rounds, presenting a barrier to achieving better rates for MSR. Several relaxations of MSR have been considered to overcome this barrier, via external regret (Kleinberg et al., 2023) and regret bounds depending polynomially on the number of actions in downstream tasks (Noarov et al., 2023; Roth and Shi, 2024). We show that the barrier can be surpassed without any relaxations: we give an efficient randomized prediction algorithm that guarantees $O(\sqrt{T}logT)$ expected MSR. We also discuss the economic utility of calibration by viewing MSR as a decision-theoretic calibration error metric and study its relationship to existing metrics.
Related papers
- Improved bounds for calibration via stronger sign preservation games [26.563792462828726]
A set of probabilistic forecasts is calibrated if each prediction of the forecaster closely approximates the empirical distribution of outcomes on the subset of timesteps where that prediction was made.
Foster & Vohra (1998) derived an algorithm with $O(T2/3)$ calibration error after $T$ time steps, and showed a lower bound of $Omega(T1/2)$.
Qiao & Valiant (2021) improved the lower bound to $Omega(T0.528)$ by introducing a game called sign preservation.
We prove that the relationship between sign preservation and calibrated forecasting is
arXiv Detail & Related papers (2024-06-19T16:19:39Z) - Orthogonal Causal Calibration [55.28164682911196]
We prove generic upper bounds on the calibration error of any causal parameter estimate $theta$ with respect to any loss $ell$.
We use our bound to analyze the convergence of two sample splitting algorithms for causal calibration.
arXiv Detail & Related papers (2024-06-04T03:35:25Z) - Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic
Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel.
An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost.
Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z) - $(\epsilon, u)$-Adaptive Regret Minimization in Heavy-Tailed Bandits [29.966828248335972]
We study the regret minimization problem when $epsilon$ and $u$ are unknown to the learner.
AdaR-UCB is the first algorithm that enjoys regret guarantees nearly matching those of the non-adaptive heavy-tailed case.
arXiv Detail & Related papers (2023-10-04T17:11:15Z) - Minimum-Risk Recalibration of Classifiers [9.31067660373791]
We introduce the concept of minimum-risk recalibration within the framework of mean-squared-error decomposition.
We show that transferring a calibrated classifier requires significantly fewer target samples compared to recalibrating from scratch.
arXiv Detail & Related papers (2023-05-18T11:27:02Z) - Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement
Learning: Adaptivity and Computational Efficiency [90.40062452292091]
We present the first computationally efficient algorithm for linear bandits with heteroscedastic noise.
Our algorithm is adaptive to the unknown variance of noise and achieves an $tildeO(d sqrtsum_k = 1K sigma_k2 + d)$ regret.
We also propose a variance-adaptive algorithm for linear mixture Markov decision processes (MDPs) in reinforcement learning.
arXiv Detail & Related papers (2023-02-21T00:17:24Z) - Autoregressive Bandits [58.46584210388307]
We propose a novel online learning setting, Autoregressive Bandits, in which the observed reward is governed by an autoregressive process of order $k$.
We show that, under mild assumptions on the reward process, the optimal policy can be conveniently computed.
We then devise a new optimistic regret minimization algorithm, namely, AutoRegressive Upper Confidence Bound (AR-UCB), that suffers sublinear regret of order $widetildemathcalO left( frac(k+1)3/2sqrtnT (1-G
arXiv Detail & Related papers (2022-12-12T21:37:36Z) - Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear
Contextual Bandits and Markov Decision Processes [59.61248760134937]
We propose an efficient algorithm to achieve a regret of $tildeO(sqrtT+zeta)$.
The proposed algorithm relies on the recently developed uncertainty-weighted least-squares regression from linear contextual bandit.
We generalize our algorithm to the episodic MDP setting and first achieve an additive dependence on the corruption level $zeta$.
arXiv Detail & Related papers (2022-12-12T15:04:56Z) - Can Q-Learning be Improved with Advice? [27.24260290748049]
This paper addresses the question of whether worst-case lower bounds for regret can be circumvented in online learning of Markov decision processes (MDPs)
We show that when predictions about the optimal $Q$-value function satisfy a reasonably weak condition we call distillation, then we can improve regret bounds by replacing the set of state-action pairs with the set of state-action pairs on which the predictions are grossly inaccurate.
Our work extends a recent line of work on algorithms with predictions, which has typically focused on simple online problems such as caching and scheduling, to the more complex and general problem of reinforcement learning.
arXiv Detail & Related papers (2021-10-25T15:44:20Z) - Online nonparametric regression with Sobolev kernels [99.12817345416846]
We derive the regret upper bounds on the classes of Sobolev spaces $W_pbeta(mathcalX)$, $pgeq 2, beta>fracdp$.
The upper bounds are supported by the minimax regret analysis, which reveals that in the cases $beta> fracd2$ or $p=infty$ these rates are (essentially) optimal.
arXiv Detail & Related papers (2021-02-06T15:05:14Z) - Optimal Algorithms for Stochastic Multi-Armed Bandits with Heavy Tailed
Rewards [24.983866845065926]
We consider multi-armed bandits with heavy-tailed rewards, whose $p$-th moment is bounded by a constant $nu_p$ for $1pleq2$.
We propose a novel robust estimator which does not require $nu_p$ as prior information.
We show that an error probability of the proposed estimator decays exponentially fast.
arXiv Detail & Related papers (2020-10-24T10:44:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.