FPGA-based Distributed Union-Find Decoder for Surface Codes
- URL: http://arxiv.org/abs/2406.08491v2
- Date: Wed, 02 Oct 2024 01:27:54 GMT
- Title: FPGA-based Distributed Union-Find Decoder for Surface Codes
- Authors: Namitha Liyanage, Yue Wu, Siona Tagare, Lin Zhong,
- Abstract summary: A fault-tolerant quantum computer must decode and correct errors faster than they appear to prevent exponential slowdown due to error correction.
We report a distributed version of the Union-Find decoder that exploits parallel computing resources for further speedup.
- Score: 3.780617572622938
- License:
- Abstract: A fault-tolerant quantum computer must decode and correct errors faster than they appear to prevent exponential slowdown due to error correction. The Union-Find (UF) decoder is promising with an average time complexity slightly higher than $O(d^3)$. We report a distributed version of the UF decoder that exploits parallel computing resources for further speedup. Using an FPGA-based implementation, we empirically show that this distributed UF decoder has a sublinear average time complexity with regard to $d$, given $O(d^3)$ parallel computing resources. The decoding time per measurement round decreases as $d$ increases, the first time for a quantum error decoder. The implementation employs a scalable architecture called Helios that organizes parallel computing resources into a hybrid tree-grid structure. Using a Xilinx VCU129 FPGA, we successfully implement $d$ up to 21 with an average decoding time of 11.5 ns per measurement round under 0.1\% phenomenological noise, and 23.7 ns for $d=17$ under equivalent circuit-level noise. This performance is significantly faster than any existing decoder implementation. Furthermore, we show that Helios can optimize for resource efficiency by decoding $d=51$ on a Xilinx VCU129 FPGA with an average latency of 544ns per measurement round.
Related papers
- Demonstrating real-time and low-latency quantum error correction with superconducting qubits [52.08698178354922]
We demonstrate low-latency feedback with a scalable FPGA decoder integrated into a superconducting quantum processor.
We observe logical error suppression as the number of decoding rounds is increased.
The decoder throughput and latency developed in this work, combined with continued device improvements, unlock the next generation of experiments.
arXiv Detail & Related papers (2024-10-07T17:07:18Z) - Quantum error correction below the surface code threshold [107.92016014248976]
Quantum error correction provides a path to reach practical quantum computing by combining multiple physical qubits into a logical qubit.
We present two surface code memories operating below a critical threshold: a distance-7 code and a distance-5 code integrated with a real-time decoder.
Our results present device performance that, if scaled, could realize the operational requirements of large scale fault-tolerant quantum algorithms.
arXiv Detail & Related papers (2024-08-24T23:08:50Z) - Fast and Parallelizable Logical Computation with Homological Product Codes [3.4338109681532027]
High-rate quantum low-density-parity-check (qLDPC) codes promise a route to reduce qubit numbers, but performing computation while maintaining low space cost has required serialization of operations and extra time costs.
We design fast and parallelizable logical gates for qLDPC codes, demonstrating their utility for key algorithmic subroutines such as the quantum adder.
arXiv Detail & Related papers (2024-07-26T03:49:59Z) - Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference [19.167604927651073]
Auto-regressive decoding of Large Language Models (LLMs) results in significant overheads in their hardware performance.
We propose a novel parallel prompt decoding that requires only $0.0002$% trainable parameters, enabling efficient training on a single A100-40GB GPU in just 16 hours.
Our approach demonstrates up to 2.49$times$ speedup and maintains a minimal memory overhead of just $0.0004$%.
arXiv Detail & Related papers (2024-05-28T22:19:30Z) - TCNCA: Temporal Convolution Network with Chunked Attention for Scalable
Sequence Processing [52.64837396100988]
MEGA is a recent transformer-based architecture, which utilizes a linear recurrent operator whose parallel computation, based on the FFT, scales as $O(LlogL)$, with $L$ being the sequence length.
We build upon their approach by replacing the linear recurrence with a special temporal convolutional network which permits larger receptive field size with shallower networks, and reduces the computational complexity to $O(L)$.
We evaluate TCNCA on EnWik8 language modeling, long-range-arena (LRA) sequence classification, as well as a synthetic reasoning benchmark associative recall.
arXiv Detail & Related papers (2023-12-09T16:12:25Z) - A Scalable, Fast and Programmable Neural Decoder for Fault-Tolerant
Quantum Computation Using Surface Codes [12.687083899824314]
Quantum error-correcting codes (QECCs) can eliminate the negative effects of quantum noise, the major obstacle to the execution of quantum algorithms.
We propose a scalable, fast, and programmable neural decoding system to meet the requirements of FTQEC for rotated surface codes (RSC)
Our system achieves an extremely low decoding latency of 197 ns, and the accuracy results of our system are close to minimum weight perfect matching (MWPM)
arXiv Detail & Related papers (2023-05-25T06:23:32Z) - Scalable Quantum Error Correction for Surface Codes using FPGA [67.74017895815125]
A fault-tolerant quantum computer must decode and correct errors faster than they appear.
We report a distributed version of the Union-Find decoder that exploits parallel computing resources for further speedup.
The implementation employs a scalable architecture called Helios that organizes parallel computing resources into a hybrid tree-grid structure.
arXiv Detail & Related papers (2023-01-20T04:23:00Z) - Parallel window decoding enables scalable fault tolerant quantum
computation [2.624902795082451]
We present a methodology that parallelizes the decoding problem and achieves almost arbitrary syndrome processing speed.
Our parallelization requires some classical feedback decisions to be delayed, leading to a slow-down of the logical clock speed.
Using known auto-teleportation gadgets the slow-down can be eliminated altogether in exchange for increased qubit overhead.
arXiv Detail & Related papers (2022-09-18T12:37:57Z) - Rapid Person Re-Identification via Sub-space Consistency Regularization [51.76876061721556]
Person Re-Identification (ReID) matches pedestrians across disjoint cameras.
Existing ReID methods adopting real-value feature descriptors have achieved high accuracy, but they are low in efficiency due to the slow Euclidean distance computation.
We propose a novel Sub-space Consistency Regularization (SCR) algorithm that can speed up the ReID procedure by 0.25$ times.
arXiv Detail & Related papers (2022-07-13T02:44:05Z) - Private Frequency Estimation via Projective Geometry [47.112770141205864]
We propose a new algorithm ProjectiveGeometryResponse (PGR) for locally differentially private (LDP) frequency estimation.
For a universe size of $k$ and with $n$ users, our $varepsilon$-LDP algorithm has communication cost $lceillogkrceil bits in the private coin setting and $varepsilonlog e + O(1)$ in the public coin setting.
In many parameter settings used in practice this is a significant improvement over the O(n+k2)$optimal cost that is achieved by the recent PI-
arXiv Detail & Related papers (2022-03-01T02:49:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.