Related papers: Regret Bounds for Robust Online Decision Making

Regret Bounds for Robust Online Decision Making

URL: http://arxiv.org/abs/2504.06820v1
Date: Wed, 09 Apr 2025 12:25:00 GMT
Title: Regret Bounds for Robust Online Decision Making
Authors: Alexander Appel, Vanessa Kosoy,
Abstract summary: We propose a framework which generalizes "decision making with structured observations"<n>In this framework, each model associates each decision with a convex set of probability distributions over outcomes.<n>We then derive a theory of regret bounds for this framework.
Score: 49.1574468325115
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose a framework which generalizes "decision making with structured observations" by allowing robust (i.e. multivalued) models. In this framework, each model associates each decision with a convex set of probability distributions over outcomes. Nature can choose distributions out of this set in an arbitrary (adversarial) manner, that can be nonoblivious and depend on past history. The resulting framework offers much greater generality than classical bandits and reinforcement learning, since the realizability assumption becomes much weaker and more realistic. We then derive a theory of regret bounds for this framework. Although our lower and upper bounds are not tight, they are sufficient to fully characterize power-law learnability. We demonstrate this theory in two special cases: robust linear bandits and tabular robust online reinforcement learning. In both cases, we derive regret bounds that improve state-of-the-art (except that we do not address computational efficiency).

Related papers

Characterizing Online and Private Learnability under Distributional Constraints via Generalized Smoothness [63.833913892018536]
We study sequential decision making under distributional adversaries that can adaptively choose data-generating distributions from a fixed family $U$.<n>We provide a near complete characterization of families $U$ that admit learnability in terms of a notion known as generalized smoothness.<n>We show that the generalized smoothness also characterizes private learnability under distributional constraints.
arXiv Detail & Related papers (2026-02-24T06:15:59Z)
Beyond Softmax: A New Perspective on Gradient Bandits [0.8594140167290097]
We establish a link between a class of discrete choice models and the theory of online learning and multi-armed bandits.<n>Our contributions are: (i) sublinear regret bounds for a broad algorithmic family, encompassing Exp3 as a special case; (ii) a new class of adversarial bandit algorithms derived from generalized nested logit models citepwen:2001; and (iii) textcolorblackWe introduce a novel class of generalized gradient bandit algorithms that extends beyond the widely used softmax formulation.
arXiv Detail & Related papers (2025-10-04T23:43:20Z)
Gentle Local Robustness implies Generalization [1.2630732866686982]
We present a class of novel bounds, which are model-dependent and provably tighter than the existing robustness-based ones.<n>Unlike prior ones, our bounds are guaranteed to converge to the true error of the best classifier, as the number of samples increases.<n>We further provide an extensive experiment and find that two of our bounds are often non-vacuous for a large class of deep neural networks, pretrained from ImageNet.
arXiv Detail & Related papers (2024-12-09T10:59:39Z)
Causal Bandits: The Pareto Optimal Frontier of Adaptivity, a Reduction to Linear Bandits, and Limitations around Unknown Marginals [28.94461817548213]
We prove upper and matching lower bounds on the possible trade-offs in the performance of learning in conditionally benign and arbitrary environments. We are the first to obtain instance-dependent bounds for causal bandits, by reducing the problem to the linear bandit setting.
arXiv Detail & Related papers (2024-07-01T04:12:15Z)
Faithful Differentiable Reasoning with Reshuffled Region-based Embeddings [62.93577376960498]
Knowledge graph embedding methods learn geometric representations of entities and relations to predict plausible missing knowledge.<n>We propose RESHUFFLE, a model based on ordering constraints that can faithfully capture a much larger class of rule bases.<n>The entity embeddings in our framework can be learned by a Graph Neural Network (GNN), which effectively acts as a differentiable rule base.
arXiv Detail & Related papers (2024-06-13T18:37:24Z)
Linear Bandits with Memory: from Rotting to Rising [5.5969337839476765]
Nonstationary phenomena, such as satiation effects in recommendations, have mostly been modeled using bandits with finitely many arms. We introduce a novel nonstationary linear bandit model, where current rewards are influenced by the learner's past actions in a fixed-size window.
arXiv Detail & Related papers (2023-02-16T15:02:07Z)
Tight Guarantees for Interactive Decision Making with the Decision-Estimation Coefficient [51.37720227675476]
We introduce a new variant of the Decision-Estimation Coefficient, and use it to derive new lower bounds that improve upon prior work on three fronts. We provide upper bounds on regret that scale with the same quantity, thereby closing all but one of the gaps between upper and lower bounds in Foster et al. Our results apply to both the regret framework and PAC framework, and make use of several new analysis and algorithm design techniques that we anticipate will find broader use.
arXiv Detail & Related papers (2023-01-19T18:24:08Z)
Robustness Implies Generalization via Data-Dependent Generalization Bounds [24.413499775513145]
This paper proves that robustness implies generalization via data-dependent generalization bounds. We present several examples, including ones for lasso and deep learning, in which our bounds are provably preferable.
arXiv Detail & Related papers (2022-06-27T17:58:06Z)
On Kernelized Multi-Armed Bandits with Constraints [16.102401271318012]
We study a bandit problem with a general unknown reward function and a general unknown constraint function. We propose a general framework for both algorithm performance analysis. We demonstrate the superior performance of our proposed algorithms via numerical experiments.
arXiv Detail & Related papers (2022-03-29T14:02:03Z)
Optimal Online Generalized Linear Regression with Stochastic Noise and Its Application to Heteroscedastic Bandits [88.6139446295537]
We study the problem of online generalized linear regression in the setting of a generalized linear model with possibly unbounded additive noise. We provide a sharp analysis of the classical follow-the-regularized-leader (FTRL) algorithm to cope with the label noise. We propose an algorithm based on FTRL to achieve the first variance-aware regret bound.
arXiv Detail & Related papers (2022-02-28T08:25:26Z)
Adversarial Robustness with Semi-Infinite Constrained Learning [177.42714838799924]
Deep learning to inputs perturbations has raised serious questions about its use in safety-critical domains. We propose a hybrid Langevin Monte Carlo training approach to mitigate this issue. We show that our approach can mitigate the trade-off between state-of-the-art performance and robust robustness.
arXiv Detail & Related papers (2021-10-29T13:30:42Z)
Provable Model-based Nonlinear Bandit and Reinforcement Learning: Shelve Optimism, Embrace Virtual Curvature [61.22680308681648]
We show that global convergence is statistically intractable even for one-layer neural net bandit with a deterministic reward. For both nonlinear bandit and RL, the paper presents a model-based algorithm, Virtual Ascent with Online Model Learner (ViOL)
arXiv Detail & Related papers (2021-02-08T12:41:56Z)
Online and Distribution-Free Robustness: Regression and Contextual Bandits with Huber Contamination [29.85468294601847]
We revisit two classic high-dimensional online learning problems, namely linear regression and contextual bandits. We show that our algorithms succeed where conventional methods fail.
arXiv Detail & Related papers (2020-10-08T17:59:05Z)
Latent Bandits Revisited [55.88616813182679]
A latent bandit problem is one in which the learning agent knows the arm reward distributions conditioned on an unknown discrete latent state. We propose general algorithms for this setting, based on both upper confidence bounds (UCBs) and Thompson sampling. We provide a unified theoretical analysis of our algorithms, which have lower regret than classic bandit policies when the number of latent states is smaller than actions.
arXiv Detail & Related papers (2020-06-15T19:24:02Z)
A Novel Confidence-Based Algorithm for Structured Bandits [129.30402124516507]
We study finite-armed bandits where the rewards of each arm might be correlated to those of other arms. We introduce a novel phased algorithm that exploits the given structure to build confidence sets over the parameters of the true bandit problem.
arXiv Detail & Related papers (2020-05-23T19:52:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.