Related papers: Multi-Armed Bandits and Quantum Channel Oracles

Multi-Armed Bandits and Quantum Channel Oracles

URL: http://arxiv.org/abs/2301.08544v4
Date: Wed, 19 Mar 2025 16:02:17 GMT
Title: Multi-Armed Bandits and Quantum Channel Oracles
Authors: Simon Buchholz, Jonas M. Kübler, Bernhard Schölkopf,
Abstract summary: We introduce further bandit models where we only have limited access to the randomness of the rewards, but we can still query the arms in superposition.<n>This generalizes the prior result that no speed-up is possible for unstructured search when the oracle has positive failure probability.
Score: 56.72223170618133
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multi-armed bandits are one of the theoretical pillars of reinforcement learning. Recently, the investigation of quantum algorithms for multi-armed bandit problems was started, and it was found that a quadratic speed-up (in query complexity) is possible when the arms and the randomness of the rewards of the arms can be queried in superposition. Here we introduce further bandit models where we only have limited access to the randomness of the rewards, but we can still query the arms in superposition. We show that then the query complexity is the same as for classical algorithms. This generalizes the prior result that no speed-up is possible for unstructured search when the oracle has positive failure probability.

Related papers

A Quantum "Lifting Theorem" for Constructions of Pseudorandom Generators from Random Oracles [7.454028086083526]
We study the (quantum) security of pseudorandom generators (PRGs) constructed from random oracles. We prove a "lifting theorem" showing, roughly, that if such a PRG is unconditionally secure against classical adversaries making unboundedly many queries to the random oracle, then it is also (unconditionally) secure against quantum adversaries in the same sense.
arXiv Detail & Related papers (2024-01-25T17:13:51Z)
Bandit Social Learning: Exploration under Myopic Behavior [58.75758600464338]
We study social learning dynamics motivated by reviews on online platforms. Agents collectively follow a simple multi-armed bandit protocol, but each agent acts myopically, without regards to exploration. We derive stark learning failures for any such behavior, and provide matching positive results.
arXiv Detail & Related papers (2023-02-15T01:57:57Z)
Combinatorial Blocking Bandits with Stochastic Delays [33.65025386998747]
Recent work has considered natural variations of the multi-armed bandit problem, where the reward of each arm is a special function of the time passed since its last pulling. In this work, we extend the above model in two directions: (i) We consider the general setting where more than one arms can be played at each round, subject to feasibility constraints. We provide a tight analysis of the approximation of a natural greedy subset that always plays the maximum expected reward feasible among the available (non-blocked) arms. When the arms' expected rewards are unknown, we adapt the above algorithm into a bandit, based on
arXiv Detail & Related papers (2021-05-22T02:46:04Z)
Adaptive Algorithms for Multi-armed Bandit with Composite and Anonymous Feedback [32.62857394584907]
We study the multi-armed bandit (MAB) problem with composite and anonymous feedback. We propose adaptive algorithms for both the adversarial and non- adversarial cases.
arXiv Detail & Related papers (2020-12-13T12:25:41Z)
Instance-Dependent Complexity of Contextual Bandits and Reinforcement Learning: A Disagreement-Based Perspective [104.67295710363679]
In the classical multi-armed bandit problem, instance-dependent algorithms attain improved performance on "easy" problems with a gap between the best and second-best arm. We introduce a family of complexity measures that are both sufficient and necessary to obtain instance-dependent regret bounds. We then introduce new oracle-efficient algorithms which adapt to the gap whenever possible, while also attaining the minimax rate in the worst case.
arXiv Detail & Related papers (2020-10-07T01:33:06Z)
Quantum exploration algorithms for multi-armed bandits [15.666205208594567]
We show that we can find the best arm with fixed confidence using $tildeObigl(sqrtsum_i=2nDeltasmash-2_ibigr)$ quantum queries. This algorithm, based on variable-time amplitude amplification and estimation, gives a quadratic speedup compared to the best possible classical result.
arXiv Detail & Related papers (2020-07-14T14:15:20Z)
A Novel Confidence-Based Algorithm for Structured Bandits [129.30402124516507]
We study finite-armed bandits where the rewards of each arm might be correlated to those of other arms. We introduce a novel phased algorithm that exploits the given structure to build confidence sets over the parameters of the true bandit problem.
arXiv Detail & Related papers (2020-05-23T19:52:44Z)
Robustness Guarantees for Mode Estimation with an Application to Bandits [131.21717367564963]
We introduce a theory for multi-armed bandits where the values are the modes of the reward distributions instead of the mean. We show in simulations that our algorithms are robust to perturbation of the arms by adversarial noise sequences.
arXiv Detail & Related papers (2020-03-05T21:29:27Z)
The Price of Incentivizing Exploration: A Characterization via Thompson Sampling and Sample Complexity [83.81297078039836]
We consider incentivized exploration: a version of multi-armed bandits where the choice of arms is controlled by self-interested agents. We focus on the price of incentives: the loss in performance, broadly construed, incurred for the sake of incentive-compatibility.
arXiv Detail & Related papers (2020-02-03T04:58:51Z)
Best Arm Identification for Cascading Bandits in the Fixed Confidence Setting [81.70513857417106]
We design and analyze CascadeBAI, an algorithm for finding the best set of $K$ items. An upper bound on the time complexity of CascadeBAI is derived by overcoming a crucial analytical challenge. We show, through the derivation of a lower bound on the time complexity, that the performance of CascadeBAI is optimal in some practical regimes.
arXiv Detail & Related papers (2020-01-23T16:47:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.