Related papers: FPGA-based Distributed Union-Find Decoder for Surface Codes

FPGA-based Distributed Union-Find Decoder for Surface Codes

URL: http://arxiv.org/abs/2406.08491v2
Date: Wed, 02 Oct 2024 01:27:54 GMT
Title: FPGA-based Distributed Union-Find Decoder for Surface Codes
Authors: Namitha Liyanage, Yue Wu, Siona Tagare, Lin Zhong,
Abstract summary: A fault-tolerant quantum computer must decode and correct errors faster than they appear to prevent exponential slowdown due to error correction. We report a distributed version of the Union-Find decoder that exploits parallel computing resources for further speedup.
Score: 3.780617572622938
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A fault-tolerant quantum computer must decode and correct errors faster than they appear to prevent exponential slowdown due to error correction. The Union-Find (UF) decoder is promising with an average time complexity slightly higher than $O(d^3)$. We report a distributed version of the UF decoder that exploits parallel computing resources for further speedup. Using an FPGA-based implementation, we empirically show that this distributed UF decoder has a sublinear average time complexity with regard to $d$, given $O(d^3)$ parallel computing resources. The decoding time per measurement round decreases as $d$ increases, the first time for a quantum error decoder. The implementation employs a scalable architecture called Helios that organizes parallel computing resources into a hybrid tree-grid structure. Using a Xilinx VCU129 FPGA, we successfully implement $d$ up to 21 with an average decoding time of 11.5 ns per measurement round under 0.1\% phenomenological noise, and 23.7 ns for $d=17$ under equivalent circuit-level noise. This performance is significantly faster than any existing decoder implementation. Furthermore, we show that Helios can optimize for resource efficiency by decoding $d=51$ on a Xilinx VCU129 FPGA with an average latency of 544ns per measurement round.

Related papers

FPGA-tailored algorithms for real-time decoding of quantum LDPC codes [1.213715600410032]
We analyze FPGA-tailored versions of three decoder classes for quantum low-density parity-check (qLDPC) codes.<n>For message passing, we analyze the recently introduced Relay decoder and its FPGA implementation.<n>For ordered statistics decoding, we introduce a filtered variant that concentrates on high-likelihood fault locations.<n>We design an FPGA-adapted generalized union-find decoder.
arXiv Detail & Related papers (2025-11-26T18:33:47Z)
Real-time decoding of the gross code memory with FPGAs [0.0]
We introduce a prototype FPGA decoder implementing the recently discovered Relay-BP algorithm.<n>The decoder is both fast and accurate, achieving a belief propagation time of 24ns.
arXiv Detail & Related papers (2025-10-24T16:03:07Z)
Accelerating Fault-Tolerant Quantum Computation with Good qLDPC Codes [4.569242390849337]
Scheme achieves constant qubit overhead and a time overhead of $O(da+o(1))$ for any $[[n,k,d]]$ qLDPC code with constant encoding rate and distance $d = Omega(n1/a)$.<n>Results establish a new paradigm for accelerating fault-tolerant quantum computation on qLDPC codes, while maintaining low overhead and broad applicability.
arXiv Detail & Related papers (2025-10-22T10:15:40Z)
Machine Learning Decoding of Circuit-Level Noise for Bivariate Bicycle Codes [0.42542143904778074]
We present a recurrent, transformer-based neural network designed to decode circuit-level noise on Bi Bicycle (BB) codes. For the $[[72,12,6]]$ BB code, at a physical error rate of $p=0.1%$, our model achieves a logical error rate almost $5$ times lower than belief propagation. These results demonstrate that machine learning decoders can out-perform conventional decoders on QLDPC codes.
arXiv Detail & Related papers (2025-04-17T15:57:16Z)
Demonstrating real-time and low-latency quantum error correction with superconducting qubits [52.08698178354922]
We demonstrate low-latency feedback with a scalable FPGA decoder integrated into a superconducting quantum processor. We observe logical error suppression as the number of decoding rounds is increased. The decoder throughput and latency developed in this work, combined with continued device improvements, unlock the next generation of experiments.
arXiv Detail & Related papers (2024-10-07T17:07:18Z)
Quantum error correction below the surface code threshold [107.92016014248976]
Quantum error correction provides a path to reach practical quantum computing by combining multiple physical qubits into a logical qubit. We present two surface code memories operating below a critical threshold: a distance-7 code and a distance-5 code integrated with a real-time decoder. Our results present device performance that, if scaled, could realize the operational requirements of large scale fault-tolerant quantum algorithms.
arXiv Detail & Related papers (2024-08-24T23:08:50Z)
Fast and Parallelizable Logical Computation with Homological Product Codes [3.4338109681532027]
High-rate quantum low-density-parity-check (qLDPC) codes promise a route to reduce qubit numbers, but performing computation while maintaining low space cost has required serialization of operations and extra time costs. We design fast and parallelizable logical gates for qLDPC codes, demonstrating their utility for key algorithmic subroutines such as the quantum adder.
arXiv Detail & Related papers (2024-07-26T03:49:59Z)
Ambiguity Clustering: an accurate and efficient decoder for qLDPC codes [0.0]
We introduce the Ambiguity Clustering decoder (AC) which divides measurement data into clusters that can be decoded independently. With 0.3% circuit-level depolarising noise, AC is up to 27x faster than BP-OSD with matched accuracy. Our implementation decodes the 144-qubit Gross code in 135us per round of syndrome extraction on an M2 CPU.
arXiv Detail & Related papers (2024-06-20T17:39:31Z)
Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference [19.167604927651073]
Auto-regressive decoding of Large Language Models (LLMs) results in significant overheads in their hardware performance. We propose a novel parallel prompt decoding that requires only $0.0002$% trainable parameters, enabling efficient training on a single A100-40GB GPU in just 16 hours. Our approach demonstrates up to 2.49$times$ speedup and maintains a minimal memory overhead of just $0.0004$%.
arXiv Detail & Related papers (2024-05-28T22:19:30Z)
TCNCA: Temporal Convolution Network with Chunked Attention for Scalable Sequence Processing [52.64837396100988]
MEGA is a recent transformer-based architecture, which utilizes a linear recurrent operator whose parallel computation, based on the FFT, scales as $O(LlogL)$, with $L$ being the sequence length. We build upon their approach by replacing the linear recurrence with a special temporal convolutional network which permits larger receptive field size with shallower networks, and reduces the computational complexity to $O(L)$. We evaluate TCNCA on EnWik8 language modeling, long-range-arena (LRA) sequence classification, as well as a synthetic reasoning benchmark associative recall.
arXiv Detail & Related papers (2023-12-09T16:12:25Z)
A Scalable, Fast and Programmable Neural Decoder for Fault-Tolerant Quantum Computation Using Surface Codes [12.687083899824314]
Quantum error-correcting codes (QECCs) can eliminate the negative effects of quantum noise, the major obstacle to the execution of quantum algorithms. We propose a scalable, fast, and programmable neural decoding system to meet the requirements of FTQEC for rotated surface codes (RSC) Our system achieves an extremely low decoding latency of 197 ns, and the accuracy results of our system are close to minimum weight perfect matching (MWPM)
arXiv Detail & Related papers (2023-05-25T06:23:32Z)
Scalable Quantum Error Correction for Surface Codes using FPGA [67.74017895815125]
A fault-tolerant quantum computer must decode and correct errors faster than they appear. We report a distributed version of the Union-Find decoder that exploits parallel computing resources for further speedup. The implementation employs a scalable architecture called Helios that organizes parallel computing resources into a hybrid tree-grid structure.
arXiv Detail & Related papers (2023-01-20T04:23:00Z)
Parallel window decoding enables scalable fault tolerant quantum computation [2.624902795082451]
We present a methodology that parallelizes the decoding problem and achieves almost arbitrary syndrome processing speed. Our parallelization requires some classical feedback decisions to be delayed, leading to a slow-down of the logical clock speed. Using known auto-teleportation gadgets the slow-down can be eliminated altogether in exchange for increased qubit overhead.
arXiv Detail & Related papers (2022-09-18T12:37:57Z)
Rapid Person Re-Identification via Sub-space Consistency Regularization [51.76876061721556]
Person Re-Identification (ReID) matches pedestrians across disjoint cameras. Existing ReID methods adopting real-value feature descriptors have achieved high accuracy, but they are low in efficiency due to the slow Euclidean distance computation. We propose a novel Sub-space Consistency Regularization (SCR) algorithm that can speed up the ReID procedure by 0.25$ times.
arXiv Detail & Related papers (2022-07-13T02:44:05Z)
Private Frequency Estimation via Projective Geometry [47.112770141205864]
We propose a new algorithm ProjectiveGeometryResponse (PGR) for locally differentially private (LDP) frequency estimation. For a universe size of $k$ and with $n$ users, our $varepsilon$-LDP algorithm has communication cost $lceillogkrceil bits in the private coin setting and $varepsilonlog e + O(1)$ in the public coin setting. In many parameter settings used in practice this is a significant improvement over the O(n+k2)$optimal cost that is achieved by the recent PI-
arXiv Detail & Related papers (2022-03-01T02:49:55Z)
VersaGNN: a Versatile accelerator for Graph neural networks [81.1667080640009]
We propose textitVersaGNN, an ultra-efficient, systolic-array-based versatile hardware accelerator. textitVersaGNN achieves on average 3712$times$ speedup with 1301.25$times$ energy reduction on CPU, and 35.4$times$ speedup with 17.66$times$ energy reduction on GPU.
arXiv Detail & Related papers (2021-05-04T04:10:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.