Optimal entanglement distribution policies in homogeneous repeater
chains with cutoffs
- URL: http://arxiv.org/abs/2207.06533v3
- Date: Fri, 21 Apr 2023 17:31:56 GMT
- Title: Optimal entanglement distribution policies in homogeneous repeater
chains with cutoffs
- Authors: \'Alvaro G. I\~nesta, Gayane Vardoyan, Lara Scavuzzo, Stephanie Wehner
- Abstract summary: We study the limits of bipartite entanglement distribution using a chain of quantum repeaters with quantum memories.
We find global-knowledge policies that minimize the expected time to produce end-to-end entanglement.
- Score: 1.9021200954913475
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the limits of bipartite entanglement distribution using a chain of
quantum repeaters that have quantum memories. To generate end-to-end
entanglement, each node can attempt the generation of an entangled link with a
neighbor, or perform an entanglement swapping measurement. A maximum storage
time, known as cutoff, is enforced on the memories to ensure high-quality
entanglement. Nodes follow a policy that determines when to perform each
operation. Global-knowledge policies take into account all the information
about the entanglement already produced. Here, we find global-knowledge
policies that minimize the expected time to produce end-to-end entanglement.
Our methods are based on Markov decision processes and value and policy
iteration. We compare optimal policies to a policy in which nodes only use
local information. We find that the advantage in expected delivery time
provided by an optimal global-knowledge policy increases with increasing number
of nodes and decreasing probability of successful swapping. Our work sheds
light on how to distribute entangled pairs in large quantum networks using a
chain of intermediate repeaters with cutoffs.
Related papers
- Multi-Objective Recommendation via Multivariate Policy Learning [10.494676556696213]
Real-world recommender systems often need to balance multiple objectives when deciding which recommendations to present to users.
These include behavioural signals (e.g. clicks, shares, dwell time), as well as broader objectives (e.g. diversity, fairness)
arXiv Detail & Related papers (2024-05-03T14:44:04Z) - Information Capacity Regret Bounds for Bandits with Mediator Feedback [55.269551124587224]
We introduce the policy set capacity as an information-theoretic measure for the complexity of the policy set.
Adopting the classical EXP4 algorithm, we provide new regret bounds depending on the policy set capacity.
For a selection of policy set families, we prove nearly-matching lower bounds, scaling similarly with the capacity.
arXiv Detail & Related papers (2024-02-15T19:18:47Z) - Reducing classical communication costs in multiplexed quantum repeaters using hardware-aware quasi-local policies [5.405186125924916]
We introduce textitquasi-local policies for multiplexed quantum repeater chains.
In quasi-local policies, nodes have increased knowledge of the state of the repeater chain, but not necessarily full, global knowledge.
Our policies also outperform the well-known and widely studied nested purification and doubling swapping policy.
arXiv Detail & Related papers (2024-01-24T01:13:55Z) - Federated Natural Policy Gradient and Actor Critic Methods for Multi-task Reinforcement Learning [46.28771270378047]
Federated reinforcement learning (RL) enables collaborative decision making of multiple distributed agents without sharing local data trajectories.
In this work, we consider a multi-task setting, in which each agent has its own private reward function corresponding to different tasks, while sharing the same transition kernel of the environment.
We learn a globally optimal policy that maximizes the sum of the discounted total rewards of all the agents in a decentralized manner.
arXiv Detail & Related papers (2023-11-01T00:15:18Z) - Provable Offline Preference-Based Reinforcement Learning [95.00042541409901]
We investigate the problem of offline Preference-based Reinforcement Learning (PbRL) with human feedback.
We consider the general reward setting where the reward can be defined over the whole trajectory.
We introduce a new single-policy concentrability coefficient, which can be upper bounded by the per-trajectory concentrability.
arXiv Detail & Related papers (2023-05-24T07:11:26Z) - Fast and reliable entanglement distribution with quantum repeaters: principles for improving protocols using reinforcement learning [0.6249768559720122]
Future quantum technologies will rely on networks of shared entanglement between spatially separated nodes.
We provide improved protocols/policies for entanglement distribution along a linear chain of nodes.
arXiv Detail & Related papers (2023-03-01T19:05:32Z) - Faster Last-iterate Convergence of Policy Optimization in Zero-Sum
Markov Games [63.60117916422867]
This paper focuses on the most basic setting of competitive multi-agent RL, namely two-player zero-sum Markov games.
We propose a single-loop policy optimization method with symmetric updates from both agents, where the policy is updated via the entropy-regularized optimistic multiplicative weights update (OMWU) method.
Our convergence results improve upon the best known complexities, and lead to a better understanding of policy optimization in competitive Markov games.
arXiv Detail & Related papers (2022-10-03T16:05:43Z) - Sequential Information Design: Markov Persuasion Process and Its
Efficient Reinforcement Learning [156.5667417159582]
This paper proposes a novel model of sequential information design, namely the Markov persuasion processes (MPPs)
Planning in MPPs faces the unique challenge in finding a signaling policy that is simultaneously persuasive to the myopic receivers and inducing the optimal long-term cumulative utilities of the sender.
We design a provably efficient no-regret learning algorithm, the Optimism-Pessimism Principle for Persuasion Process (OP4), which features a novel combination of both optimism and pessimism principles.
arXiv Detail & Related papers (2022-02-22T05:41:43Z) - Universal Off-Policy Evaluation [64.02853483874334]
We take the first steps towards a universal off-policy estimator (UnO)
We use UnO for estimating and simultaneously bounding the mean, variance, quantiles/median, inter-quantile range, CVaR, and the entire cumulative distribution of returns.
arXiv Detail & Related papers (2021-04-26T18:54:31Z) - Cooperative Multi-Agent Reinforcement Learning with Partial Observations [16.895704973433382]
We propose a distributed zeroth-order policy optimization method for Multi-Agent Reinforcement Learning (MARL)
It allows the agents to compute the local policy gradients needed to update their local policy functions using local estimates of the global accumulated rewards.
We show that the proposed distributed zeroth-order policy optimization method with constant stepsize converges to the neighborhood of a policy that is a stationary point of the global objective function.
arXiv Detail & Related papers (2020-06-18T19:36:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.