Quantum Heavy-tailed Bandits
- URL: http://arxiv.org/abs/2301.09680v1
- Date: Mon, 23 Jan 2023 19:23:10 GMT
- Title: Quantum Heavy-tailed Bandits
- Authors: Yulian Wu, Chaowen Guan, Vaneet Aggarwal and Di Wang
- Abstract summary: We study multi-armed bandits (MAB) and linear bandits (SLB) with heavy-tailed rewards and quantum reward.
We first propose a new quantum mean estimator for heavy-tailed distributions, which is based on the Quantum Monte Carlo Estimator.
Based on our quantum mean estimator, we focus on quantum heavy-tailed MAB and SLB and propose quantum algorithms based on the Upper Confidence Bound (UCB) framework.
- Score: 36.458771174473924
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we study multi-armed bandits (MAB) and stochastic linear
bandits (SLB) with heavy-tailed rewards and quantum reward oracle. Unlike the
previous work on quantum bandits that assumes bounded/sub-Gaussian
distributions for rewards, here we investigate the quantum bandits problem
under a weaker assumption that the distributions of rewards only have bounded
$(1+v)$-th moment for some $v\in (0,1]$. In order to achieve regret
improvements for heavy-tailed bandits, we first propose a new quantum mean
estimator for heavy-tailed distributions, which is based on the Quantum Monte
Carlo Mean Estimator and achieves a quadratic improvement of estimation error
compared to the classical one. Based on our quantum mean estimator, we focus on
quantum heavy-tailed MAB and SLB and propose quantum algorithms based on the
Upper Confidence Bound (UCB) framework for both problems with
$\Tilde{O}(T^{\frac{1-v}{1+v}})$ regrets, polynomially improving the dependence
in terms of $T$ as compared to classical (near) optimal regrets of
$\Tilde{O}(T^{\frac{1}{1+v}})$, where $T$ is the number of rounds. Finally,
experiments also support our theoretical results and show the effectiveness of
our proposed methods.
Related papers
- Variance-Dependent Regret Bounds for Non-stationary Linear Bandits [52.872628573907434]
We propose algorithms that utilize the variance of the reward distribution as well as the $B_K$, and show that they can achieve tighter regret upper bounds.
We introduce two novel algorithms: Restarted Weighted$textOFUL+$ and Restarted $textSAVE+$.
Notably, when the total variance $V_K$ is much smaller than $K$, our algorithms outperform previous state-of-the-art results on non-stationary linear bandits under different settings.
arXiv Detail & Related papers (2024-03-15T23:36:55Z) - Towards large-scale quantum optimization solvers with few qubits [59.63282173947468]
We introduce a variational quantum solver for optimizations over $m=mathcalO(nk)$ binary variables using only $n$ qubits, with tunable $k>1$.
We analytically prove that the specific qubit-efficient encoding brings in a super-polynomial mitigation of barren plateaus as a built-in feature.
arXiv Detail & Related papers (2024-01-17T18:59:38Z) - A Quadratic Speedup in Finding Nash Equilibria of Quantum Zero-Sum Games [102.46640028830441]
We introduce the Optimistic Matrix Multiplicative Weights Update (OMMWU) algorithm and establish its average-iterate convergence complexity as $mathcalO(d/epsilon)$ to $epsilon$-Nash equilibria.
This quadratic speed-up sets a new benchmark for computing $epsilon$-Nash equilibria in quantum zero-sum games.
arXiv Detail & Related papers (2023-11-17T20:38:38Z) - Quantum Bayesian Optimization [64.58749619145908]
We introduce the quantum-Gaussian process-upper confidence bound (Q-GP-UCB) algorithm.
It is the first BO algorithm able to achieve a regret upper bound of O(polylog T), which is significantly smaller than its regret lower bound of Omega(sqrt(T)) in the classical setting.
Thanks to our novel analysis of the confidence ellipsoid, our Q-GP-UCB with the linear kernel achieves a smaller regret than the quantum linear UCB algorithm.
arXiv Detail & Related papers (2023-10-09T03:10:42Z) - Quantum Lower Bounds by Sample-to-Query Lifting [7.319050391449301]
We propose an arguably new method for proving quantum query lower bounds by a quantum sample-to-query lifting theorem.
We provide unified proof for some known lower bounds, including those for phase/amplitude estimation and Hamiltonian simulation.
arXiv Detail & Related papers (2023-08-03T14:41:49Z) - Alleviating the quantum Big-$M$ problem [0.237499051649312]
Classically known as the "Big-$M$" problem, it affects the physical energy scale.
We take a systematic, encompassing look at the quantum big-$M$ problem, revealing NP-hardness in finding the optimal $M$.
We propose a practical translation algorithm, based on SDP relaxation, that outperforms previous methods in numerical benchmarks.
arXiv Detail & Related papers (2023-07-19T18:00:05Z) - Quantum advantages for transportation tasks: projectiles, rockets and
quantum backflow [0.0]
We find that there exist "ultrafast" quantum states, whose probability of arrival is greater than that of any classical particle prepared in the same region with the same momentum distribution.
For both projectiles and rockets, we prove that the quantum advantage, quantified by the difference between the quantum and optimal classical arrival probabilities, is limited by the Bracken-Melloy constant $c_bm$.
arXiv Detail & Related papers (2022-09-01T20:56:00Z) - Quantum Multi-Armed Bandits and Stochastic Linear Bandits Enjoy
Logarithmic Regrets [20.426493569846855]
Multi-arm bandit (MAB) and linear bandit (SLB) are important models in reinforcement learning.
We propose quantum algorithms for both models with $O(mboxpoly(log T))$ regrets.
This is the first provable quantum speedup for regrets of bandit problems and in general exploitation in reinforcement learning.
arXiv Detail & Related papers (2022-05-30T10:54:53Z) - Quantum exploration algorithms for multi-armed bandits [15.666205208594567]
We show that we can find the best arm with fixed confidence using $tildeObigl(sqrtsum_i=2nDeltasmash-2_ibigr)$ quantum queries.
This algorithm, based on variable-time amplitude amplification and estimation, gives a quadratic speedup compared to the best possible classical result.
arXiv Detail & Related papers (2020-07-14T14:15:20Z) - Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies.
We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.