Related papers: LATTE: A Decoding Architecture for Quantum Computing with Temporal and Spatial Scalability

LATTE: A Decoding Architecture for Quantum Computing with Temporal and Spatial Scalability

URL: http://arxiv.org/abs/2509.03954v1
Date: Thu, 04 Sep 2025 07:29:21 GMT
Title: LATTE: A Decoding Architecture for Quantum Computing with Temporal and Spatial Scalability
Authors: Kai Zhang, Jubo Xu, Fang Zhang, Linghang Kong, Zhengfeng Ji, Jianxin Chen,
Abstract summary: We introduce a FPGA- hybrid decoding architecture, LATTE, to address the key requirements of scaling up in lattice surgery quantum overhead.<n>LATTE delivers accuracy on par with the base decoder while achieving real-time decoding throughput and significantly reducing both bandwidth requirements and computational resources.
Score: 7.184133388805955
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Quantum error correction allows inherently noisy quantum devices to emulate an ideal quantum computer with reasonable resource overhead. As a crucial component, decoding architectures have received significant attention recently. In this paper, we introduce LATTE, a FPGA-CPU hybrid decoding architecture aiming to address the key requirements of scaling up in lattice surgery quantum computation -- Latency, Accuracy, Throughput and Transmission Bandwidth, in an Eclectic manner. LATTE follows a hierarchical design: (1) A fully streaming and asynchronous block decoding system on CPU to enable parallelization both temporally and spatially. (2) A super-light yet accurate neural local decoding unit integrated with quantum control hardware on FPGA, which remains \emph{transparent} to the block decoding system, effectively reducing transmission bandwidth and accelerating the decoding process. LATTE delivers accuracy on par with the base decoder while achieving real-time decoding throughput and significantly reducing both bandwidth requirements and computational resources, enabling a level of scalability far beyond previous approaches. Under circuit-level noise $p=0.001$, LATTE achieves over $\mathbf{90\%}$ reduction in transmission bandwidth and a $\mathbf{6.4\times}$ speedup on average in single-block decoding. In the \emph{streaming decoding} scenario: (1) LATTE achieves constant and low latency ($\mathbf{16\times}$-$\mathbf{20\times}$ speedup over existing streaming decoding implementations) in arbitrarily long quantum memory experiments, with near-optimal resources -- merely $\mathbf{2}$ threads are sufficient for decoding the surface code with distance up to $17$. (2) LATTE minimizes latency in multi-patch measurement experiments through highly parallelized decoding operations. These combined efforts ensure sufficient scalability for large-scale fault-tolerant quantum computing.

Related papers

A distillation-teleportation protocol for fault-tolerant QRAM [95.99192129224721]
We present a protocol for fault-tolerantly implementing the logical quantum random access memory (QRAM) operation.<n>For coherently accessing classical memories of size $2n$, our protocol consumes only $mathrmpoly(n)$ fault-tolerant quantum resources.
arXiv Detail & Related papers (2025-05-26T17:42:56Z)
Fast correlated decoding of transversal logical algorithms [67.01652927671279]
Quantum error correction (QEC) is required for large-scale computation, but incurs a significant resource overhead.<n>Recent advances have shown that by jointly decoding logical qubits in algorithms composed of logical gates, the number of syndrome extraction rounds can be reduced.<n>Here, we reform the problem of decoding circuits by directly decoding relevant logical operator products as they propagate through the circuit.
arXiv Detail & Related papers (2025-05-19T18:00:00Z)
Local Clustering Decoder: a fast and adaptive hardware decoder for the surface code [0.0]
We introduce the Local Clustering Decoder as a solution that simultaneously achieves the accuracy and speed requirements of a real-time decoding system. Our decoder is implemented on FPGAs and exploits hardware parallelism to keep pace with the fastest qubit types. It enables one million error-free quantum operations with 4x fewer physical qubits when compared to standard non-adaptive decoding.
arXiv Detail & Related papers (2024-11-15T16:43:59Z)
Accelerating Error Correction Code Transformers [56.75773430667148]
We introduce a novel acceleration method for transformer-based decoders. We achieve a 90% compression ratio and reduce arithmetic operation energy consumption by at least 224 times on modern hardware.
arXiv Detail & Related papers (2024-10-08T11:07:55Z)
Demonstrating real-time and low-latency quantum error correction with superconducting qubits [52.08698178354922]
We demonstrate low-latency feedback with a scalable FPGA decoder integrated into a superconducting quantum processor. We observe logical error suppression as the number of decoding rounds is increased. The decoder throughput and latency developed in this work, combined with continued device improvements, unlock the next generation of experiments.
arXiv Detail & Related papers (2024-10-07T17:07:18Z)
Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference [19.167604927651073]
Auto-regressive decoding of Large Language Models (LLMs) results in significant overheads in their hardware performance. We propose a novel parallel prompt decoding that requires only $0.0002$% trainable parameters, enabling efficient training on a single A100-40GB GPU in just 16 hours. Our approach demonstrates up to 2.49$times$ speedup and maintains a minimal memory overhead of just $0.0004$%.
arXiv Detail & Related papers (2024-05-28T22:19:30Z)
Efficient and Scalable Architectures for Multi-Level Superconducting Qubit Readout [0.8999666725996978]
Many processor modalities are inherently multi-level systems. This leads to occasional leakage into energy levels outside the computational subspace.<n>We propose a scalable, high-fidelity three-level readout that reduces FPGA resource usage by $60times$ compared to the baseline.<n>Our design supports efficient, real-time implementation on off-the-shelf FPGAs, delivering a 6.6% improvement in readout accuracy over the baseline.
arXiv Detail & Related papers (2024-05-14T22:32:51Z)
Spatially parallel decoding for multi-qubit lattice surgery [0.10713888959520208]
Running quantum algorithms protected by quantum error correction requires a real time, classical decoder.<n>Most prior work on real time decoding has focused on an isolated logical qubit encoded in the surface code.<n>For surface code, quantum programs of utility will require multi-qubit interactions performed via lattice surgery.<n>A large merged patch can arise during lattice surgery -- possibly as large as the entire device.
arXiv Detail & Related papers (2024-03-03T00:17:13Z)
A real-time, scalable, fast and highly resource efficient decoder for a quantum computer [1.9014261239550778]
We introduce the Collision Clustering decoder and implement it on FPGA and ASIC hardware. We simulate logical memory experiments using the leading quantum error correction scheme, the surface code. We demonstrate MHz decoding speed - matching the requirements of fast-operating modalities such as superconducting qubits.
arXiv Detail & Related papers (2023-09-11T15:46:27Z)
A Scalable, Fast and Programmable Neural Decoder for Fault-Tolerant Quantum Computation Using Surface Codes [12.687083899824314]
Quantum error-correcting codes (QECCs) can eliminate the negative effects of quantum noise, the major obstacle to the execution of quantum algorithms. We propose a scalable, fast, and programmable neural decoding system to meet the requirements of FTQEC for rotated surface codes (RSC) Our system achieves an extremely low decoding latency of 197 ns, and the accuracy results of our system are close to minimum weight perfect matching (MWPM)
arXiv Detail & Related papers (2023-05-25T06:23:32Z)
Parallel window decoding enables scalable fault tolerant quantum computation [2.624902795082451]
We present a methodology that parallelizes the decoding problem and achieves almost arbitrary syndrome processing speed. Our parallelization requires some classical feedback decisions to be delayed, leading to a slow-down of the logical clock speed. Using known auto-teleportation gadgets the slow-down can be eliminated altogether in exchange for increased qubit overhead.
arXiv Detail & Related papers (2022-09-18T12:37:57Z)
An Adaptive Device-Edge Co-Inference Framework Based on Soft Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices. We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations. Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z)
FastFlowNet: A Lightweight Network for Fast Optical Flow Estimation [81.76975488010213]
Dense optical flow estimation plays a key role in many robotic vision tasks. Current networks often occupy large number of parameters and require heavy computation costs. Our proposed FastFlowNet works in the well-known coarse-to-fine manner with following innovations.
arXiv Detail & Related papers (2021-03-08T03:09:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.