Optimal and Efficient Dynamic Regret Algorithms for Non-Stationary
Dueling Bandits
- URL: http://arxiv.org/abs/2111.03917v1
- Date: Sat, 6 Nov 2021 16:46:55 GMT
- Title: Optimal and Efficient Dynamic Regret Algorithms for Non-Stationary
Dueling Bandits
- Authors: Shubham Gupta, Aadirupa Saha
- Abstract summary: We study the problem of emphdynamic regret minimization in $K$-armed Dueling Bandits under non-stationary or time varying preferences.
This is an online learning setup where the agent chooses a pair of items at each round and observes only a relative binary win-loss' feedback for this pair.
- Score: 27.279654173896372
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the problem of \emph{dynamic regret minimization} in $K$-armed
Dueling Bandits under non-stationary or time varying preferences. This is an
online learning setup where the agent chooses a pair of items at each round and
observes only a relative binary `win-loss' feedback for this pair, sampled from
an underlying preference matrix at that round. We first study the problem of
static-regret minimization for adversarial preference sequences and design an
efficient algorithm with $O(\sqrt{KT})$ high probability regret. We next use
similar algorithmic ideas to propose an efficient and provably optimal
algorithm for dynamic-regret minimization under two notions of
non-stationarities. In particular, we establish $\tO(\sqrt{SKT})$ and
$\tO({V_T^{1/3}K^{1/3}T^{2/3}})$ dynamic-regret guarantees, $S$ being the total
number of `effective-switches' in the underlying preference relations and $V_T$
being a measure of `continuous-variation' non-stationarity. The complexity of
these problems have not been studied prior to this work despite the
practicability of non-stationary environments in real world systems. We justify
the optimality of our algorithms by proving matching lower bound guarantees
under both the above-mentioned notions of non-stationarities. Finally, we
corroborate our results with extensive simulations and compare the efficacy of
our algorithms over state-of-the-art baselines.
Related papers
- Perturb-and-Project: Differentially Private Similarities and Marginals [73.98880839337873]
We revisit the input perturbations framework for differential privacy where noise is added to the input $Ain mathcalS$.
We first design novel efficient algorithms to privately release pair-wise cosine similarities.
We derive a novel algorithm to compute $k$-way marginal queries over $n$ features.
arXiv Detail & Related papers (2024-06-07T12:07:16Z) - Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic
Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel.
An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost.
Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z) - Combinatorial Stochastic-Greedy Bandit [79.1700188160944]
We propose a novelgreedy bandit (SGB) algorithm for multi-armed bandit problems when no extra information other than the joint reward of the selected set of $n$ arms at each time $tin [T]$ is observed.
SGB adopts an optimized-explore-then-commit approach and is specifically designed for scenarios with a large set of base arms.
arXiv Detail & Related papers (2023-12-13T11:08:25Z) - Strictly Low Rank Constraint Optimization -- An Asymptotically
$\mathcal{O}(\frac{1}{t^2})$ Method [5.770309971945476]
We propose a class of non-text and non-smooth problems with textitrank regularization to promote sparsity in optimal solution.
We show that our algorithms are able to achieve a singular convergence of $Ofrac(t2)$, which is exactly same as Nesterov's optimal convergence for first-order methods on smooth convex problems.
arXiv Detail & Related papers (2023-07-04T16:55:41Z) - Revisiting Weighted Strategy for Non-stationary Parametric Bandits [82.1942459195896]
This paper revisits the weighted strategy for non-stationary parametric bandits.
We propose a refined analysis framework, which produces a simpler weight-based algorithm.
Our new framework can be used to improve regret bounds of other parametric bandits.
arXiv Detail & Related papers (2023-03-05T15:11:14Z) - Best-of-Three-Worlds Linear Bandit Algorithm with Variance-Adaptive
Regret Bounds [27.92755687977196]
The paper proposes a linear bandit algorithm that is adaptive to environments at two different levels of hierarchy.
At the higher level, the proposed algorithm adapts to a variety of types of environments.
The proposed algorithm has data-dependent regret bounds that depend on all of the cumulative loss for the optimal action.
arXiv Detail & Related papers (2023-02-24T00:02:24Z) - ANACONDA: An Improved Dynamic Regret Algorithm for Adaptive
Non-Stationary Dueling Bandits [20.128001589147512]
We study the problem of non-stationary dueling bandits and provide the first adaptive dynamic regret algorithm for this problem.
We show a near-optimal $tildeO(sqrtStexttCW T)$ dynamic regret bound, where $StexttCW$ is the number of times the Condorcet winner changes in $T$ rounds.
arXiv Detail & Related papers (2022-10-25T20:26:02Z) - Minimax Optimization with Smooth Algorithmic Adversaries [59.47122537182611]
We propose a new algorithm for the min-player against smooth algorithms deployed by an adversary.
Our algorithm is guaranteed to make monotonic progress having no limit cycles, and to find an appropriate number of gradient ascents.
arXiv Detail & Related papers (2021-06-02T22:03:36Z) - Non-stationary Reinforcement Learning without Prior Knowledge: An
Optimal Black-box Approach [42.021871809877595]
We present a black-box reduction that turns a certain reinforcement learning algorithm with optimal regret in a near-stationary environment into another algorithm with optimal dynamic regret in a non-stationary environment.
We show that our approach significantly improves the state of the art for linear bandits, episodic MDPs, and infinite-horizon MDPs.
arXiv Detail & Related papers (2021-02-10T12:43:31Z) - Ranking a set of objects: a graph based least-square approach [70.7866286425868]
We consider the problem of ranking $N$ objects starting from a set of noisy pairwise comparisons provided by a crowd of equal workers.
We propose a class of non-adaptive ranking algorithms that rely on a least-squares intrinsic optimization criterion for the estimation of qualities.
arXiv Detail & Related papers (2020-02-26T16:19:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.