Adaptive Policies for Resource Generation in a Quantum Network
- URL: http://arxiv.org/abs/2509.17576v1
- Date: Mon, 22 Sep 2025 11:04:12 GMT
- Title: Adaptive Policies for Resource Generation in a Quantum Network
- Authors: Aksel Tacettin, Tianchen Qu, Bethany Davies, Boris Goranov, Ioana-Lisandra Draganescu, Gayane Vardoyan,
- Abstract summary: Protocols for distributed quantum systems commonly require simultaneous availability of $n$ entangled states.<n>We derive optimal policies that minimise the expected time until $n$ entangled states are available with fidelity greater than $F_mathrmapp$.
- Score: 0.5332865877117923
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Protocols for distributed quantum systems commonly require the simultaneous availability of $n$ entangled states, each with a fidelity above some fixed minimum $F_{\mathrm{app}}$ relative to the target maximally-entangled state. However, the fidelity of entangled states degrades over time while in memory. Entangled states are therefore rendered useless when their fidelity falls below $F_{\mathrm{app}}$. This is problematic when entanglement generation is probabilistic and attempted in a sequential manner, because the expected completion time until $n$ entangled states are available can be large. Motivated by existing entanglement generation schemes, we consider a system where the entanglement generation parameters (the success probability $p$ and fidelity $F$ of the generated entangled state) may be adjusted at each time step. We model the system as a Markov decision process, where the policy dictates which generation parameters $(p,F)$ to use for each attempt. We use dynamic programming to derive optimal policies that minimise the expected time until $n$ entangled states are available with fidelity greater than $F_{\mathrm{app}}$. We observe that the advantage of our optimal policies over the selected baselines increases significantly with $n$. In the parameter regimes explored, which are based closely on current experiments, we find that the optimal policy can provide a speed-up of as much as a factor of twenty over a constant-action policy. In addition, we propose a computationally inexpensive heuristic method to compute policies that perform either optimally or near-optimally in the parameter regimes explored. Our heuristic method can be used to find high-performing policies in parameter regimes where finding an optimal policy is intractable.
Related papers
- Dense and Diverse Goal Coverage in Multi Goal Reinforcement Learning [13.831084892489754]
In natural situations, it is desirable to learn a policy that induces a dispersed marginal state distribution over rewarding states.<n>We propose a novel algorithm that learns a high-return policy mixture with marginal state distribution dispersed over the set of goal states.
arXiv Detail & Related papers (2025-10-29T09:23:21Z) - Best-Effort Policies for Robust Markov Decision Processes [69.60742680559788]
We study the common generalization of Markov decision processes (MDPs) with sets of transition probabilities, known as robust MDPs (RMDPs)<n>We call such a policy an optimal robust best-effort (ORBE) policy.<n>We prove that ORBE policies always exist, characterize their structure, and present an algorithm to compute them with a small overhead compared to standard robust value iteration.
arXiv Detail & Related papers (2025-08-11T09:18:34Z) - Confident Natural Policy Gradient for Local Planning in $q_π$-realizable Constrained MDPs [44.69257217086967]
The constrained Markov decision process (CMDP) framework emerges as an important reinforcement learning approach for imposing safety or other critical objectives.<n>In this paper, we address the learning problem given linear function approximation with $q_pi$-realizability.
arXiv Detail & Related papers (2024-06-26T17:57:13Z) - Bayesian Learning of Optimal Policies in Markov Decision Processes with Countably Infinite State-Space [0.0]
We study the problem of optimal control of a family of discrete-time countable state-space Markov Decision Processes.
We propose an algorithm based on Thompson sampling with dynamically-sized episodes.
We show that our algorithm can be applied to develop approximately optimal control algorithms.
arXiv Detail & Related papers (2023-06-05T03:57:16Z) - Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time
Guarantees [56.848265937921354]
Inverse reinforcement learning (IRL) aims to recover the reward function and the associated optimal policy.
Many algorithms for IRL have an inherently nested structure.
We develop a novel single-loop algorithm for IRL that does not compromise reward estimation accuracy.
arXiv Detail & Related papers (2022-10-04T17:13:45Z) - Efficient Policy Iteration for Robust Markov Decision Processes via
Regularization [49.05403412954533]
Robust decision processes (MDPs) provide a framework to model decision problems where the system dynamics are changing or only partially known.
Recent work established the equivalence between texttts rectangular $L_p$ robust MDPs and regularized MDPs, and derived a regularized policy iteration scheme that enjoys the same level of efficiency as standard MDPs.
In this work, we focus on the policy improvement step and derive concrete forms for the greedy policy and the optimal robust Bellman operators.
arXiv Detail & Related papers (2022-05-28T04:05:20Z) - Near-optimality for infinite-horizon restless bandits with many arms [19.12759228067286]
Restless bandits are problems with applications in recommender systems, active learning, revenue management and other areas.
We derive a class of policies, called fluid-balance policies, that have a $O(sqrtN)$ optimality gap.
We also demonstrate empirically that fluid-balance policies provide state-of-the-art performance on specific problems.
arXiv Detail & Related papers (2022-03-29T18:49:21Z) - Understanding the Effect of Stochasticity in Policy Optimization [86.7574122154668]
We show that the preferability of optimization methods depends critically on whether exact gradients are used.
Second, to explain these findings we introduce the concept of committal rate for policy optimization.
Third, we show that in the absence of external oracle information, there is an inherent trade-off between exploiting geometry to accelerate convergence versus achieving optimality almost surely.
arXiv Detail & Related papers (2021-10-29T06:35:44Z) - Softmax Policy Gradient Methods Can Take Exponential Time to Converge [60.98700344526674]
The softmax policy gradient (PG) method is arguably one of the de facto implementations of policy optimization in modern reinforcement learning.
We demonstrate that softmax PG methods can take exponential time -- in terms of $mathcalS|$ and $frac11-gamma$ -- to converge.
arXiv Detail & Related papers (2021-02-22T18:56:26Z) - Provably Efficient Safe Exploration via Primal-Dual Policy Optimization [105.7510838453122]
We study the Safe Reinforcement Learning (SRL) problem using the Constrained Markov Decision Process (CMDP) formulation.
We present an provably efficient online policy optimization algorithm for CMDP with safe exploration in the function approximation setting.
arXiv Detail & Related papers (2020-03-01T17:47:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.