Bandits roaming Hilbert space
- URL: http://arxiv.org/abs/2509.24569v1
- Date: Mon, 29 Sep 2025 10:26:29 GMT
- Title: Bandits roaming Hilbert space
- Authors: Josep Lumbreras,
- Abstract summary: We study the exploration and exploitation trade-off in online learning of properties of quantum states using multi-armed bandits.<n>We derive information-theoretic lower bounds and optimal strategies with matching upper bounds, showing regret scales as the square root of rounds.
- Score: 0.7614628596146601
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This thesis studies the exploration and exploitation trade-off in online learning of properties of quantum states using multi-armed bandits. Given streaming access to an unknown quantum state, in each round we select an observable from a set of actions to maximize its expectation value. Using past information, we refine actions to minimize regret; the cumulative gap between current reward and the maximum possible. We derive information-theoretic lower bounds and optimal strategies with matching upper bounds, showing regret typically scales as the square root of rounds. As an application, we reframe quantum state tomography to both learn the state efficiently and minimize measurement disturbance. For pure states and continuous actions, we achieve polylogarithmic regret using a sample-optimal algorithm based on a weighted online least squares estimator. The algorithm relies on the optimistic principle and controls the eigenvalues of the design matrix. We also apply our framework to quantum recommender systems and thermodynamic work extraction from unknown states. In this last setting, our results demonstrate an exponential advantage in work dissipation over tomography-based protocols.
Related papers
- Quartic quantum speedups for community detection [84.14713515477784]
We develop a quantum algorithm for hypergraph community detection that achieves a quartic quantum speedup.<n>Our algorithm is based on the Kikuchi method, which we extend beyond previously considered problems such as PCA and $p$XORSAT.
arXiv Detail & Related papers (2025-10-09T17:35:17Z) - Calibration of Quantum Devices via Robust Statistical Methods [45.464983015777314]
We numerically analyze advanced statistical methods for Bayesian inference against the state-of-the-art in quantum parameter learning.<n>We show advantages of these approaches over existing ones, namely under multi-modality and high dimensionality.<n>Our findings have applications in challenging quantumcharacterization tasks namely learning the dynamics of open quantum systems.
arXiv Detail & Related papers (2025-07-09T15:22:17Z) - Smarter Usage of Measurement Statistics Can Greatly Improve Continuous Variable Quantum Reservoir Computing [1.03590082373586]
Quantum reservoir computing is a machine learning scheme in which a quantum system is used to perform information processing.<n>We consider storing past measurement results in classical memory, and show that it improves the memory capacity and can be used to mitigate the effects of statistical noise.
arXiv Detail & Related papers (2025-07-04T13:09:44Z) - Scalable Policy Maximization Under Network Interference [46.16641537379657]
We study optimal-policy learning under interference on a dynamic network.<n>Under common assumptions on the structure of interference, rewards become linear.<n>We develop a scalable Thompson sampling algorithm that maximizes policy impact when a new $n$-node network is observed each round.
arXiv Detail & Related papers (2025-05-23T17:19:12Z) - Correlating noise floor with magic and entanglement in Pauli product states [37.69303106863453]
We show the ability to recover resources specific to quantum computing from noisy states generated by Pauli product formulas.<n>The fidelity of purified states represents the noise floor of a given computation.<n>We experimentally validate these findings by collecting classical shadow data for a range of small circuits.
arXiv Detail & Related papers (2025-05-07T19:24:00Z) - Quantum decision trees with information entropy [0.0]
We present a classification algorithm for quantum states inspired by decision-tree methods.<n>For each measurement shot on an unknown quantum state, the algorithm selects the observable with the highest expected information gain, continuing until convergence.<n>Despite not relying on circuit-based quantum neural networks, the algorithm still encounters challenges akin to the barren plateau problem.
arXiv Detail & Related papers (2025-02-17T03:51:40Z) - Learning pure quantum states (almost) without regret [7.988085110283119]
We study the study of sample-optimal quantum state tomography with minimal disturbance to the samples.<n>Can we efficiently learn a precise description of a quantum state through sequential measurements of samples while at the same time making sure that the post-measurement state of the samples is only minimally perturbed?
arXiv Detail & Related papers (2024-06-26T14:13:50Z) - Multi-Armed Bandits with Abstention [62.749500564313834]
We introduce a novel extension of the canonical multi-armed bandit problem that incorporates an additional strategic element: abstention.
In this enhanced framework, the agent is not only tasked with selecting an arm at each time step, but also has the option to abstain from accepting the instantaneous reward before observing it.
arXiv Detail & Related papers (2024-02-23T06:27:12Z) - Quantum Bayesian Optimization [64.58749619145908]
We introduce the quantum-Gaussian process-upper confidence bound (Q-GP-UCB) algorithm.
It is the first BO algorithm able to achieve a regret upper bound of O(polylog T), which is significantly smaller than its regret lower bound of Omega(sqrt(T)) in the classical setting.
Thanks to our novel analysis of the confidence ellipsoid, our Q-GP-UCB with the linear kernel achieves a smaller regret than the quantum linear UCB algorithm.
arXiv Detail & Related papers (2023-10-09T03:10:42Z) - A Score-Based Model for Learning Neural Wavefunctions [41.82403146569561]
We provide a new framework for obtaining properties of quantum many-body ground states using score-based neural networks.
Our new framework does not require explicit probability distribution and performs the sampling via Langevin dynamics.
arXiv Detail & Related papers (2023-05-25T23:44:27Z) - PopArt: Efficient Sparse Regression and Experimental Design for Optimal
Sparse Linear Bandits [29.097522376094624]
We propose a simple and computationally efficient sparse linear estimation method called PopArt.
We derive sparse linear bandit algorithms that enjoy improved regret upper bounds upon the state of the art.
arXiv Detail & Related papers (2022-10-25T19:13:20Z) - Large-Scale Sequential Learning for Recommender and Engineering Systems [91.3755431537592]
In this thesis, we focus on the design of an automatic algorithms that provide personalized ranking by adapting to the current conditions.
For the former, we propose novel algorithm called SAROS that take into account both kinds of feedback for learning over the sequence of interactions.
The proposed idea of taking into account the neighbour lines shows statistically significant results in comparison with the initial approach for faults detection in power grid.
arXiv Detail & Related papers (2022-05-13T21:09:41Z) - Multi-armed quantum bandits: Exploration versus exploitation when
learning properties of quantum states [13.213490507208528]
We study tradeoffs between exploration and exploitation in online learning of properties of quantum states.
We provide various information-theoretic lower bounds on the cumulative regret that an optimal learner must incur.
We also investigate the dependence of the cumulative regret on the number of available actions and the dimension of the underlying space.
arXiv Detail & Related papers (2021-08-30T08:15:04Z) - Experimental Design for Regret Minimization in Linear Bandits [19.8309784360219]
We propose a novel design-based algorithm to minimize regret in online linear and bandits.
We provide state-of-the-art finite time regret guarantees and show that our algorithm can be applied in both the bandit and semi-bandit feedback regime.
arXiv Detail & Related papers (2020-11-01T17:59:19Z) - Latent Bandits Revisited [55.88616813182679]
A latent bandit problem is one in which the learning agent knows the arm reward distributions conditioned on an unknown discrete latent state.
We propose general algorithms for this setting, based on both upper confidence bounds (UCBs) and Thompson sampling.
We provide a unified theoretical analysis of our algorithms, which have lower regret than classic bandit policies when the number of latent states is smaller than actions.
arXiv Detail & Related papers (2020-06-15T19:24:02Z) - More Practical and Adaptive Algorithms for Online Quantum State Learning [11.836183463815653]
In this paper, we develop algorithms to advance the online learning of quantum states.
First, we show that Regularized Follow-the-Leader (RFTL) method with Tallis-2 entropy can achieve an $O(sqrtMT)$ total loss with perfect hindsight.
Second, we propose a parameter-free algorithm based on a classical adjusting learning rate schedule.
arXiv Detail & Related papers (2020-06-01T15:17:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.