Reinforcement learning entangling operations on spin qubits
- URL: http://arxiv.org/abs/2508.14761v1
- Date: Wed, 20 Aug 2025 15:05:38 GMT
- Title: Reinforcement learning entangling operations on spin qubits
- Authors: Mohammad Abedi, Markus Schmitt,
- Abstract summary: We present a reinforcement learning approach to find entangling protocols for semiconductor-based singlet-triplet qubits in a double quantum dot.<n>We demonstrate that an RL agent can yield performative protocols, while avoiding the model-biases of traditional gradient-based methods.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: High-fidelity control of one- and two-qubit gates past the error correction threshold is an essential ingredient for scalable quantum computing. We present a reinforcement learning (RL) approach to find entangling protocols for semiconductor-based singlet-triplet qubits in a double quantum dot. Despite the presence of realistically modelled experimental constraints, such as various noise contributions and finite rise-time effects, we demonstrate that an RL agent can yield performative protocols, while avoiding the model-biases of traditional gradient-based methods. We optimise our RL approach for different regimes and tasks, including training from simulated process tomography reconstruction of unitary gates, and investigate the nuances of RL agent design.
Related papers
- Controllable Exploration in Hybrid-Policy RLVR for Multi-Modal Reasoning [88.42566960813438]
CalibRL is a hybrid-policy RLVR framework that supports controllable exploration with expert guidance.<n>CalibRL increases policy entropy in a guided manner and clarifies the target distribution.<n>Experiments across eight benchmarks, including both in-domain and out-of-domain settings, demonstrate consistent improvements.
arXiv Detail & Related papers (2026-02-22T07:23:36Z) - Sample-Efficient Neurosymbolic Deep Reinforcement Learning [49.60927398960061]
We propose a neuro-symbolic Deep RL approach that integrates background symbolic knowledge to improve sample efficiency.<n>Online reasoning is performed to guide the training process through two mechanisms.<n>We show improved performance over a state-of-the-art reward machine baseline.
arXiv Detail & Related papers (2026-01-06T09:28:53Z) - Achieving fast and robust perfect entangling gates via reinforcement learning [0.08030359871216612]
We leverage reinforcement learning techniques to discover near-optimal pulse shapes that yield PE gates.<n>A collection of RL agents is trained within robust simulation environments, enabling the identification of effective control strategies.<n>The RL approach is hardware agnostic with the potential for broad applicability across various quantum computing platforms.
arXiv Detail & Related papers (2025-11-10T13:07:19Z) - Training with Fewer Bits: Unlocking Edge LLMs Training with Stochastic Rounding [37.30928503608494]
Quantized training improves computational and memory efficiency but introduces quantization noise.<n>We show that increased batch sizes can compensate for reduced precision during back-propagation.<n>We also show that quantizing weights and activations impacts gradient variance in distinct ways.
arXiv Detail & Related papers (2025-11-02T09:49:34Z) - ConfClip: Confidence-Weighted and Clipped Reward for Reinforcement Learning in LLMs [32.13266235550995]
Reinforcement learning (RL) has become a standard paradigm for refining large language models (LLMs)<n>Inspired by observations from human learning, we introduce a RL technique that integrates verifiable outcomes with the model's own confidence estimates.
arXiv Detail & Related papers (2025-09-22T13:00:35Z) - Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs [51.21041884010009]
Ring-lite is a Mixture-of-Experts (MoE)-based large language model optimized via reinforcement learning (RL)<n>Our approach matches the performance of state-of-the-art (SOTA) small-scale reasoning models on challenging benchmarks.
arXiv Detail & Related papers (2025-06-17T17:12:34Z) - Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining [74.83412846804977]
Reinforcement learning (RL)-based fine-tuning has become a crucial step in post-training language models.<n>We present a systematic end-to-end study of RL fine-tuning for mathematical reasoning by training models entirely from scratch.
arXiv Detail & Related papers (2025-04-10T17:15:53Z) - Symmetric Reinforcement Learning Loss for Robust Learning on Diverse Tasks and Model Scales [13.818149654692863]
Reinforcement learning (RL) training is inherently unstable due to factors such as moving targets and high gradient variance.<n>In this work, we improve the stability of RL training by adapting the reverse cross entropy (RCE) from supervised learning for noisy data to define a symmetric RL loss.
arXiv Detail & Related papers (2024-05-27T19:28:33Z) - Provable Reward-Agnostic Preference-Based Reinforcement Learning [61.39541986848391]
Preference-based Reinforcement Learning (PbRL) is a paradigm in which an RL agent learns to optimize a task using pair-wise preference-based feedback over trajectories.
We propose a theoretical reward-agnostic PbRL framework where exploratory trajectories that enable accurate learning of hidden reward functions are acquired.
arXiv Detail & Related papers (2023-05-29T15:00:09Z) - Combining Reinforcement Learning and Tensor Networks, with an Application to Dynamical Large Deviations [0.0]
We present a framework to integrate tensor network (TN) methods with reinforcement learning (RL)
We consider the RL actor-critic method, a model-free approach for solving RL problems, and introduce TNs as the approximators for its policy and value functions.
arXiv Detail & Related papers (2022-09-28T13:33:31Z) - Learning to Reweight Imaginary Transitions for Model-Based Reinforcement
Learning [58.66067369294337]
When the model is inaccurate or biased, imaginary trajectories may be deleterious for training the action-value and policy functions.
We adaptively reweight the imaginary transitions, so as to reduce the negative effects of poorly generated trajectories.
Our method outperforms state-of-the-art model-based and model-free RL algorithms on multiple tasks.
arXiv Detail & Related papers (2021-04-09T03:13:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.