Related papers: Multi-armed quantum bandits: Exploration versus exploitation when learning properties of quantum states

Multi-armed quantum bandits: Exploration versus exploitation when learning properties of quantum states

URL: http://arxiv.org/abs/2108.13050v3
Date: Mon, 20 Jun 2022 03:44:19 GMT
Title: Multi-armed quantum bandits: Exploration versus exploitation when learning properties of quantum states
Authors: Josep Lumbreras and Erkka Haapasalo and Marco Tomamichel
Abstract summary: We study tradeoffs between exploration and exploitation in online learning of properties of quantum states. We provide various information-theoretic lower bounds on the cumulative regret that an optimal learner must incur. We also investigate the dependence of the cumulative regret on the number of available actions and the dimension of the underlying space.
Score: 13.213490507208528
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We initiate the study of tradeoffs between exploration and exploitation in online learning of properties of quantum states. Given sequential oracle access to an unknown quantum state, in each round, we are tasked to choose an observable from a set of actions aiming to maximize its expectation value on the state (the reward). Information gained about the unknown state from previous rounds can be used to gradually improve the choice of action, thus reducing the gap between the reward and the maximal reward attainable with the given action set (the regret). We provide various information-theoretic lower bounds on the cumulative regret that an optimal learner must incur, and show that it scales at least as the square root of the number of rounds played. We also investigate the dependence of the cumulative regret on the number of available actions and the dimension of the underlying space. Moreover, we exhibit strategies that are optimal for bandits with a finite number of arms and general mixed states.

Related papers

Bandits roaming Hilbert space [0.7614628596146601]
We study the exploration and exploitation trade-off in online learning of properties of quantum states using multi-armed bandits.<n>We derive information-theoretic lower bounds and optimal strategies with matching upper bounds, showing regret scales as the square root of rounds.
arXiv Detail & Related papers (2025-09-29T10:26:29Z)
Quantum decision trees with information entropy [0.0]
We present a classification algorithm for quantum states inspired by decision-tree methods. For each measurement shot on an unknown quantum state, the algorithm selects the observable with the highest expected information gain, continuing until convergence. Despite not relying on circuit-based quantum neural networks, the algorithm still encounters challenges akin to the barren plateau problem.
arXiv Detail & Related papers (2025-02-17T03:51:40Z)
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization [91.80034860399677]
Reinforcement learning algorithms aim to balance exploiting the current best strategy with exploring new options that could lead to higher rewards. We introduce a framework, MaxInfoRL, for balancing intrinsic and extrinsic exploration. We show that our approach achieves sublinear regret in the simplified setting of multi-armed bandits.
arXiv Detail & Related papers (2024-12-16T18:59:53Z)
Learning pure quantum states (almost) without regret [7.988085110283119]
A learner has sequential oracle access to an unknown pure quantum state. The learner's goal is to minimise the expected cumulative regret over $T$ rounds. We show that the cumulative regret scales as $Theta(operatornamepolylog T)$ using a new tomography algorithm.
arXiv Detail & Related papers (2024-06-26T14:13:50Z)
An Effective Way to Determine the Separability of Quantum State [0.0]
We propose a practical approach to address the longstanding and challenging problem of quantum separability.<n>General separability conditions are obtained by dint of constructing the measurement-induced Bloch space.<n>It is found that criteria obtained in our approach can be directly transformed into entanglement witness operators.
arXiv Detail & Related papers (2024-03-12T06:17:19Z)
Multi-Armed Bandits with Abstention [62.749500564313834]
We introduce a novel extension of the canonical multi-armed bandit problem that incorporates an additional strategic element: abstention. In this enhanced framework, the agent is not only tasked with selecting an arm at each time step, but also has the option to abstain from accepting the instantaneous reward before observing it.
arXiv Detail & Related papers (2024-02-23T06:27:12Z)
Quantum steering from phase measurements with limited resources [0.20616237122336117]
Quantum steering captures the ability of one party, Alice, to control through quantum correlations the state at a distant location. Our results provide guidelines to apply such a metrological approach to the validation of quantum channels.
arXiv Detail & Related papers (2024-01-30T20:37:00Z)
Postselection-free learning of measurement-induced quantum dynamics [0.0]
We introduce a general-purpose scheme that can be used to infer any property of the post-measurement ensemble of states. As an immediate application, we show that our method can be used to verify the emergence of quantum state designs in experiments.
arXiv Detail & Related papers (2023-10-06T11:06:06Z)
Stronger Quantum Speed Limit For Mixed Quantum States [0.0]
We derive a quantum speed limit for mixed quantum states using the stronger uncertainty relation for mixed quantum states and unitary evolution. We show that this bound can be optimized over different choices of operators for obtaining a better bound.
arXiv Detail & Related papers (2023-07-05T11:44:57Z)
Quantum contextual bandits and recommender systems for quantum data [13.213490507208528]
We study a recommender system for quantum data using the linear contextual bandit framework. We formulate the low energy quantum state recommendation problem where the context is a Hamiltonian. We observe that if we interpret the actions as different phases of the models then the recommendation is done by classifying the correct phase of the given Hamiltonian.
arXiv Detail & Related papers (2023-01-31T10:17:53Z)
The power of noisy quantum states and the advantage of resource dilution [62.997667081978825]
Entanglement distillation allows to convert noisy quantum states into singlets. We show that entanglement dilution can increase the resilience of shared quantum states to local noise.
arXiv Detail & Related papers (2022-10-25T17:39:29Z)
Anticipative measurements in hybrid quantum-classical computation [68.8204255655161]
We present an approach where the quantum computation is supplemented by a classical result. Taking advantage of its anticipation also leads to a new type of quantum measurements, which we call anticipative. In an anticipative quantum measurement the combination of the results from classical and quantum computations happens only in the end.
arXiv Detail & Related papers (2022-09-12T15:47:44Z)
Robust quantum metrology with random Majorana constellations [0.0]
A number of physical systems can be described by their Majorana constellations of points on the surface of a sphere. If these points are chosen randomly, how quantum will the resultant state be, on average? We explore this simple conceptual question in detail, investigating the quantum properties of the resulting random states.
arXiv Detail & Related papers (2021-12-02T21:41:03Z)
Latent Bandits Revisited [55.88616813182679]
A latent bandit problem is one in which the learning agent knows the arm reward distributions conditioned on an unknown discrete latent state. We propose general algorithms for this setting, based on both upper confidence bounds (UCBs) and Thompson sampling. We provide a unified theoretical analysis of our algorithms, which have lower regret than classic bandit policies when the number of latent states is smaller than actions.
arXiv Detail & Related papers (2020-06-15T19:24:02Z)
Boundaries of quantum supremacy via random circuit sampling [69.16452769334367]
Google's recent quantum supremacy experiment heralded a transition point where quantum computing performed a computational task, random circuit sampling. We examine the constraints of the observed quantum runtime advantage in a larger number of qubits and gates.
arXiv Detail & Related papers (2020-05-05T20:11:53Z)
Predictive Bandits [68.8204255655161]
We introduce and study a new class of bandit problems, referred to as predictive bandits. In each round, the decision maker first decides whether to gather information about the rewards of particular arms. The decision maker then selects an arm to be actually played in the round.
arXiv Detail & Related papers (2020-04-02T17:12:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.