Related papers: Adaptive Discretization against an Adversary: Lipschitz bandits, Dynamic Pricing, and Auction Tuning

Adaptive Discretization against an Adversary: Lipschitz bandits, Dynamic Pricing, and Auction Tuning

URL: http://arxiv.org/abs/2006.12367v4
Date: Thu, 12 Jun 2025 17:48:17 GMT
Title: Adaptive Discretization against an Adversary: Lipschitz bandits, Dynamic Pricing, and Auction Tuning
Authors: Chara Podimata, Aleksandrs Slivkins,
Abstract summary: Lipschitz bandits is a prominent version of multi-armed bandits that studies large, structured action spaces.<n>A central theme here is the adaptive discretization of the action space, which gradually zooms in'' on the more promising regions.<n>We provide the first algorithm for adaptive discretization in the adversarial version, and derive instance-dependent regret bounds.
Score: 56.23358327635815
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Lipschitz bandits is a prominent version of multi-armed bandits that studies large, structured action spaces such as the $[0,1]$ interval, where similar actions are guaranteed to have similar rewards. A central theme here is the adaptive discretization of the action space, which gradually ``zooms in'' on the more promising regions thereof. The goal is to take advantage of ``nicer'' problem instances, while retaining near-optimal worst-case performance. While the stochastic version of the problem is well-understood, the general version with adversarial rewards is not. We provide the first algorithm (\emph{Adversarial Zooming}) for adaptive discretization in the adversarial version, and derive instance-dependent regret bounds. In particular, we recover the worst-case optimal regret bound for the adversarial version, and the instance-dependent regret bound for the stochastic version. We apply our algorithm to several fundamental applications -- including dynamic pricing and auction reserve tuning -- all under adversarial reward models. While these domains often violate Lipschitzness, our analysis only requires a weaker version thereof, allowing for meaningful regret bounds without additional smoothness assumptions. Notably, we extend our results to multi-product dynamic pricing with non-smooth reward structures, a setting which does not even satisfy one-sided Lipschitzness.

Related papers

Catoni-Style Change Point Detection for Regret Minimization in Non-Stationary Heavy-Tailed Bandits [31.212504858546232]
We tackle the heavy-tailed piecewise-stationary bandit problem.<n>We provide a novel Catoni-style change-point detection strategy tailored for heavy-tailed distributions.<n>We introduce Robust-CPD-UCB, which combines this change-point detection strategy with optimistic algorithms for bandits.
arXiv Detail & Related papers (2025-05-26T14:40:47Z)
Catoni Contextual Bandits are Robust to Heavy-tailed Rewards [31.381627608971414]
We develop an algorithmic approach building on Catoni's estimator from robust statistics. We establish a regret bound that depends only on the cumulative reward variance and logarithmically on the reward range $R$. Our algorithm also enjoys a variance-based bound with logarithmic reward-range dependence.
arXiv Detail & Related papers (2025-02-04T17:03:32Z)
An Adaptive Approach for Infinitely Many-armed Bandits under Generalized Rotting Constraints [29.596684377841182]
In this study, we consider the infinitely many-armed bandit problems in a rested rotting setting, where the mean reward of an arm may decrease with each pull, while otherwise, it remains unchanged. We introduce an algorithm that utilizes UCB with an adaptive sliding window, designed to manage the bias and variance trade-off arising due to rotting rewards.
arXiv Detail & Related papers (2024-04-22T14:11:54Z)
$(\epsilon, u)$-Adaptive Regret Minimization in Heavy-Tailed Bandits [29.966828248335972]
We study the regret minimization problem when $epsilon$ and $u$ are unknown to the learner. AdaR-UCB is the first algorithm that enjoys regret guarantees nearly matching those of the non-adaptive heavy-tailed case.
arXiv Detail & Related papers (2023-10-04T17:11:15Z)
Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits [53.281230333364505]
This paper studies the problem of contextual dueling bandits, where the binary comparison of dueling arms is generated from a generalized linear model (GLM) We propose a new SupLinUCB-type algorithm that enjoys computational efficiency and a variance-aware regret bound $tilde Obig(dsqrtsum_t=1Tsigma_t2 + dbig)$. Our regret bound naturally aligns with the intuitive expectation in scenarios where the comparison is deterministic, the algorithm only suffers from an $tilde O(d)$ regret.
arXiv Detail & Related papers (2023-10-02T08:15:52Z)
Invariant Lipschitz Bandits: A Side Observation Approach [18.688474183114085]
We study the invariant Lipschitz bandit setting, where the reward function and the set of arms are preserved under a group of transformations. We introduce an algorithm named textttUniformMesh-N, which naturally integrates side observations using group orbits. We prove an improved regret upper bound, which depends on the cardinality of the group, given that the group is finite.
arXiv Detail & Related papers (2022-12-14T22:12:32Z)
ANACONDA: An Improved Dynamic Regret Algorithm for Adaptive Non-Stationary Dueling Bandits [20.128001589147512]
We study the problem of non-stationary dueling bandits and provide the first adaptive dynamic regret algorithm for this problem. We show a near-optimal $tildeO(sqrtStexttCW T)$ dynamic regret bound, where $StexttCW$ is the number of times the Condorcet winner changes in $T$ rounds.
arXiv Detail & Related papers (2022-10-25T20:26:02Z)
Contextual Bandits with Smooth Regret: Efficient Learning in Continuous Action Spaces [14.366265951396587]
We design efficient general-purpose contextual bandit algorithms for large -- or even continuous -- action spaces. We propose a smooth regret notion for contextual bandits, which dominates previously proposed alternatives. Our algorithms can be used to recover the previous minimax/Pareto optimal guarantees under the standard regret.
arXiv Detail & Related papers (2022-07-12T21:27:09Z)
Online Nonsubmodular Minimization with Delayed Costs: From Full Information to Bandit Feedback [98.7678704343537]
We focus on a class of nonsubmodular functions with special structure, and prove regret guarantees for several variants of the online and approximate online bandit gradient descent algorithms. We derive bounds for the agent's regret in the full information and bandit feedback setting, even if the delay between choosing a decision and receiving the incurred cost is unbounded.
arXiv Detail & Related papers (2022-05-15T08:27:12Z)
Minimax Optimal Quantile and Semi-Adversarial Regret via Root-Logarithmic Regularizers [31.102181563065844]
Quantile (and, more generally, KL) regret bounds relax the goal of competing against the best individual expert to only competing against a majority of experts on adversarial data. More recently, the semi-adversarial paradigm (Bilodeau, Negrea, and Roy 2020) provides an alternative relaxation of adversarial online learning by considering data that may be neither fully adversarial nor adversarial. We achieve the minimax optimal regret in both paradigms using FTRL with separate, novel, root-logarithmic regularizers, both of which can be interpreted as yielding variants of NormalHedge.
arXiv Detail & Related papers (2021-10-27T22:38:52Z)
Anti-Concentrated Confidence Bonuses for Scalable Exploration [57.91943847134011]
Intrinsic rewards play a central role in handling the exploration-exploitation trade-off. We introduce emphanti-concentrated confidence bounds for efficiently approximating the elliptical bonus. We develop a practical variant for deep reinforcement learning that is competitive with contemporary intrinsic rewards on Atari benchmarks.
arXiv Detail & Related papers (2021-10-21T15:25:15Z)
Bias-Robust Bayesian Optimization via Dueling Bandit [57.82422045437126]
We consider Bayesian optimization in settings where observations can be adversarially biased. We propose a novel approach for dueling bandits based on information-directed sampling (IDS) Thereby, we obtain the first efficient kernelized algorithm for dueling bandits that comes with cumulative regret guarantees.
arXiv Detail & Related papers (2021-05-25T10:08:41Z)
Nearly Dimension-Independent Sparse Linear Bandit over Small Action Spaces via Best Subset Selection [71.9765117768556]
We consider the contextual bandit problem under the high dimensional linear model. This setting finds essential applications such as personalized recommendation, online advertisement, and personalized medicine. We propose doubly growing epochs and estimating the parameter using the best subset selection method.
arXiv Detail & Related papers (2020-09-04T04:10:39Z)
Lenient Regret for Multi-Armed Bandits [72.56064196252498]
We consider the Multi-Armed Bandit (MAB) problem, where an agent sequentially chooses actions and observes rewards for the actions it took. While the majority of algorithms try to minimize the regret, i.e., the cumulative difference between the reward of the best action and the agent's action, this criterion might lead to undesirable results. We suggest a new, more lenient, regret criterion that ignores suboptimality gaps smaller than some $epsilon$.
arXiv Detail & Related papers (2020-08-10T08:30:52Z)
Stochastic Linear Bandits Robust to Adversarial Attacks [117.665995707568]
We provide two variants of a Robust Phased Elimination algorithm, one that knows $C$ and one that does not. We show that both variants attain near-optimal regret in the non-corrupted case $C = 0$, while incurring additional additive terms respectively. In a contextual setting, we show that a simple greedy algorithm is provably robust with a near-optimal additive regret term, despite performing no explicit exploration and not knowing $C$.
arXiv Detail & Related papers (2020-07-07T09:00:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.