Related papers: LSQCA: Resource-Efficient Load/Store Architecture for Limited-Scale Fault-Tolerant Quantum Computing

LSQCA: Resource-Efficient Load/Store Architecture for Limited-Scale Fault-Tolerant Quantum Computing

URL: http://arxiv.org/abs/2412.20486v2
Date: Mon, 14 Apr 2025 06:02:08 GMT
Title: LSQCA: Resource-Efficient Load/Store Architecture for Limited-Scale Fault-Tolerant Quantum Computing
Authors: Takumi Kobori, Yasunari Suzuki, Yosuke Ueno, Teruo Tanimoto, Synge Todo, Yuuki Tokunaga,
Abstract summary: We propose an FTQC architecture based on a novel floorplan strategy, which can achieve almost 100% memory density.<n>The idea behind our architecture is to separate all memory regions into small computational space called Computational Registers (CR) and space-efficient memory space called Scan-Access Memory (SAM)
Score: 0.4486093197820338
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Current fault-tolerant quantum computer (FTQC) architectures utilize several encoding techniques to enable reliable logical operations with restricted qubit connectivity. However, such logical operations demand additional memory overhead to ensure fault tolerance. Since the main obstacle to practical quantum computing is the limited qubit count, our primary mission is to design floorplans that can reduce memory overhead without compromising computational capability. Despite extensive efforts to explore FTQC architectures, even the current state-of-the-art floorplan strategy devotes 50% of memory space to this overhead, not to data storage, to ensure unit-time random access to all logical qubits. In this paper, we propose an FTQC architecture based on a novel floorplan strategy, Load/Store Quantum Computer Architecture (LSQCA), which can achieve almost 100% memory density. The idea behind our architecture is to separate all memory regions into small computational space called Computational Registers (CR) and space-efficient memory space called Scan-Access Memory (SAM). We define an instruction set for these abstract structures and provide concrete designs named point-SAM and line-SAM architectures. With this design, we can improve the memory density by allowing variable-latency memory access while concealing the latency with other bottlenecks. We also propose optimization techniques to exploit properties of quantum programs observed in our static analysis, such as access locality in memory reference timestamps. Our numerical results indicate that LSQCA successfully leverages this idea. In a resource-restricted situation, a specific benchmark shows that we can achieve about 90% memory density with 5% increase in the execution time compared to a conventional floorplan, which achieves at most 50% memory density for unit-time random access. Our design ensures broad quantum applicability.

Related papers

LoGeR: Long-Context Geometric Reconstruction with Hybrid Memory [97.14005794889134]
We present LoGeR, a novel architecture that scales dense 3D reconstruction to extremely long sequences without post-optimization.<n>LoGeR processes video streams in chunks, leveraging strong bidirectional priors for high-fidelity intra-chunk reasoning.<n>This memory architecture enables LoGeR to be trained on sequences of 128 frames, and generalize up to thousands of frames during inference.
arXiv Detail & Related papers (2026-03-03T18:55:37Z)
AQER: a scalable and efficient data loader for digital quantum computers [62.40228216126285]
We develop AQER, a scalable AQL method that constructs the loading circuit by systematically reducing entanglement in target states.<n>We conduct systematic experiments to evaluate the effectiveness of AQER, using synthetic datasets, classical image and language datasets, and a quantum many-body state datasets with up to 50 qubits.
arXiv Detail & Related papers (2026-02-02T14:39:42Z)
Efficient Quantum State Preparation with Bucket Brigade QRAM [47.72095699729477]
Preparation of data in quantum states is a critical component in the design of quantum algorithms.<n>One of the main approaches to achieve efficient state preparation is through the use of Quantum Random Access Memory (QRAM)<n>We present a framework that integrates the physical model of the Bucket Brigade QRAM (BBQRAM) with the classical data structure of the Segment Tree to achieve efficient state preparation.
arXiv Detail & Related papers (2025-10-17T18:50:07Z)
Hardware-software co-exploration with racetrack memory based in-memory computing for CNN inference in embedded systems [54.045712360156024]
racetrack memory is a non-volatile technology that allows high data density fabrication.<n>In-memory arithmetic circuits with memory cells affects both the memory density and power efficiency.<n>We present an efficient in-memory convolutional neural network (CNN) accelerator optimized for use with racetrack memory.
arXiv Detail & Related papers (2025-07-02T07:29:53Z)
Extractors: QLDPC Architectures for Efficient Pauli-Based Computation [42.95092131256421]
We propose a new primitive that can augment any QLDPC memory into a computational block well-suited for Pauli-based computation. In particular, any logical Pauli operator supported on the memory can be fault-tolerantly measured in one logical cycle. Our architecture can implement universal quantum circuits via parallel logical measurements.
arXiv Detail & Related papers (2025-03-13T14:07:40Z)
QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache [67.84112700032007]
Large Language Models (LLMs) are increasingly being deployed on edge devices for long-context settings. In these scenarios, the Key-Value ( KV) cache is the primary bottleneck in terms of both GPU memory and latency. We propose a novel self-speculative decoding framework, QuantSpec, where the draft model shares the architecture of the target model but employs a hierarchical 4-bit quantized KV cache and 4-bit quantized weights for acceleration.
arXiv Detail & Related papers (2025-02-05T20:43:48Z)
CHIME: Energy-Efficient STT-RAM-based Concurrent Hierarchical In-Memory Processing [1.5566524830295307]
This paper introduces a novel PiC/PiM architecture, Concurrent Hierarchical In-Memory Processing (CHIME) CHIME strategically incorporates heterogeneous compute units across multiple levels of the memory hierarchy. Experiments reveal that, compared to the state-of-the-art bit-line computing approaches, CHIME achieves significant speedup and energy savings of 57.95% and 78.23%.
arXiv Detail & Related papers (2024-07-29T01:17:54Z)
B'MOJO: Hybrid State Space Realizations of Foundation Models with Eidetic and Fading Memory [91.81390121042192]
We develop a class of models called B'MOJO to seamlessly combine eidetic and fading memory within an composable module. B'MOJO's ability to modulate eidetic and fading memory results in better inference on longer sequences tested up to 32K tokens.
arXiv Detail & Related papers (2024-07-08T18:41:01Z)
Efficient and accurate neural field reconstruction using resistive memory [52.68088466453264]
Traditional signal reconstruction methods on digital computers face both software and hardware challenges. We propose a systematic approach with software-hardware co-optimizations for signal reconstruction from sparse inputs. This work advances the AI-driven signal restoration technology and paves the way for future efficient and robust medical AI and 3D vision applications.
arXiv Detail & Related papers (2024-04-15T09:33:09Z)
SMOF: Streaming Modern CNNs on FPGAs with Smart Off-Chip Eviction [6.800641017055453]
This paper introduces weight and activation eviction mechanisms to off-chip memory along the computational pipeline. The proposed mechanism is incorporated into an existing toolflow, expanding the design space by utilising off-chip memory as a buffer. SMOF has demonstrated the capacity to deliver competitive and, in some cases, state-of-the-art performance across a spectrum of computer vision tasks.
arXiv Detail & Related papers (2024-03-27T18:12:24Z)
Shuttling for Scalable Trapped-Ion Quantum Computers [2.8956730787977083]
We propose an efficient shuttling schedule for Trapped-ion quantum computers. The proposed approach produces shuttling schedules with a close-to-minimal amount of time steps. An implementation of the proposed approach is publicly available as part of the open-source Munich Quantum Toolkit.
arXiv Detail & Related papers (2024-02-21T19:00:04Z)
NumS: Scalable Array Programming for the Cloud [82.827921577004]
We present NumS, an array programming library which optimize NumPy-like expressions on task-based distributed systems. This is achieved through a novel scheduler called Load Simulated Hierarchical Scheduling (LSHS) We show that LSHS enhances performance on Ray by decreasing network load by a factor of 2x, requiring 4x less memory, and reducing execution time by 10x on the logistic regression problem.
arXiv Detail & Related papers (2022-06-28T20:13:40Z)
Memory Planning for Deep Neural Networks [0.0]
We study memory allocation patterns in DNNs during inference. Latencies incurred due to such textttmutex contention produce undesirable bottlenecks in user-facing services. We present an implementation of textttMemoMalloc in the PyTorch deep learning framework.
arXiv Detail & Related papers (2022-02-23T05:28:18Z)
Logical blocks for fault-tolerant topological quantum computation [55.41644538483948]
We present a framework for universal fault-tolerant logic motivated by the need for platform-independent logical gate definitions. We explore novel schemes for universal logic that improve resource overheads. Motivated by the favorable logical error rates for boundaryless computation, we introduce a novel computational scheme.
arXiv Detail & Related papers (2021-12-22T19:00:03Z)
Neural Network Compression for Noisy Storage Devices [71.4102472611862]
Conventionally, model compression and physical storage are decoupled. This approach forces the storage to treat each bit of the compressed model equally, and to dedicate the same amount of resources to each bit. We propose a radically different approach that: (i) employs analog memories to maximize the capacity of each memory cell, and (ii) jointly optimize model compression and physical storage to maximize memory utility.
arXiv Detail & Related papers (2021-02-15T18:19:07Z)
Time-Sliced Quantum Circuit Partitioning for Modular Architectures [67.85032071273537]
Current quantum computer designs will not scale. To scale beyond small prototypes, quantum architectures will likely adopt a modular approach with clusters of tightly connected quantum bits and sparser connections between clusters. We exploit this clustering and the statically-known control flow of quantum programs to create tractable partitionings which map quantum circuits to modular physical machines one time slice at a time.
arXiv Detail & Related papers (2020-05-25T17:58:44Z)
In-memory Implementation of On-chip Trainable and Scalable ANN for AI/ML Applications [0.0]
This paper presents an in-memory computing architecture for ANN enabling artificial intelligence (AI) and machine learning (ML) applications. Our novel on-chip training and inference in-memory architecture reduces energy cost and enhances throughput by simultaneously accessing the multiple rows of array per precharge cycle. The proposed architecture was trained and tested on the IRIS dataset which exhibits $46times$ more energy efficient per MAC (multiply and accumulate) operation compared to earlier classifiers.
arXiv Detail & Related papers (2020-05-19T15:36:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.