Related papers: Standalone FPGA-Based QAOA Emulator for Weighted-MaxCut on Embedded Devices

Standalone FPGA-Based QAOA Emulator for Weighted-MaxCut on Embedded Devices

URL: http://arxiv.org/abs/2502.11316v1
Date: Sun, 16 Feb 2025 23:30:16 GMT
Title: Standalone FPGA-Based QAOA Emulator for Weighted-MaxCut on Embedded Devices
Authors: Seonghyun Choi, Kyeongwon Lee, Jae-Jin Lee, Woojoo Lee,
Abstract summary: This study introduces a compact, standalone FPGA-based QC emulator for embedded systems. The proposed design reduces time complexity from O(N2) to O(N) where N equals 2n for n qubits. The emulator achieved energy savings ranging from 1.53 times for two-qubit configurations to up to 852 times for nine-qubit configurations.
Score: 3.384874651944418
License:
Abstract: Quantum computing QC emulation is crucial for advancing QC applications, especially given the scalability constraints of current devices. FPGA-based designs offer an efficient and scalable alternative to traditional large-scale platforms, but most are tightly integrated with high-performance systems, limiting their use in mobile and edge environments. This study introduces a compact, standalone FPGA-based QC emulator designed for embedded systems, leveraging the Quantum Approximate Optimization Algorithm (QAOA) to solve the Weighted-MaxCut problem. By restructuring QAOA operations for hardware compatibility, the proposed design reduces time complexity from O(N^2) to O(N), where N equals 2^n for n qubits. This reduction, coupled with a pipeline architecture, significantly minimizes resource consumption, enabling support for up to nine qubits on mid-tier FPGAs, roughly three times more than comparable designs. Additionally, the emulator achieved energy savings ranging from 1.53 times for two-qubit configurations to up to 852 times for nine-qubit configurations, compared to software-based QAOA on embedded processors. These results highlight the practical scalability and resource efficiency of the proposed design, providing a robust foundation for QC emulation in resource-constrained edge devices.

Related papers

HEPPO: Hardware-Efficient Proximal Policy Optimization -- A Universal Pipelined Architecture for Generalized Advantage Estimation [0.0]
HEPPO is an FPGA-based accelerator designed to optimize the Generalized Advantage Estimation stage in Proximal Policy Optimization. Key innovation is our strategic standardization technique, which combines dynamic reward standardization and block standardization for values, followed by 8-bit uniform quantization. Our single-chip solution minimizes communication latency and throughput bottlenecks, significantly boosting PPO training efficiency.
arXiv Detail & Related papers (2025-01-22T08:18:56Z)
Compiler for Distributed Quantum Computing: a Reinforcement Learning Approach [6.347685922582191]
We introduce a novel compiler that prioritizes reducing the expected execution time by jointly managing the generation and routing of EPR pairs. We present a real-time, adaptive approach to compiler design, accounting for the nature of entanglement generation and the operational demands of quantum circuits. Our contributions are twofold: (i) we model the optimal compiler for DQC using a Markov Decision Process (MDP) formulation, establishing the existence of an optimal algorithm, and (ii) we introduce a constrained Reinforcement Learning (RL) method to approximate this optimal compiler.
arXiv Detail & Related papers (2024-04-25T23:03:20Z)
A2Q: Accumulator-Aware Quantization with Guaranteed Overflow Avoidance [49.1574468325115]
accumulator-aware quantization (A2Q) is a novel weight quantization method designed to train quantized neural networks (QNNs) to avoid overflow during inference. A2Q introduces a unique formulation inspired by weight normalization that constrains the L1-norm of model weights according to accumulator bit width bounds. We show A2Q can train QNNs for low-precision accumulators while maintaining model accuracy competitive with a floating-point baseline.
arXiv Detail & Related papers (2023-08-25T17:28:58Z)
DeepGEMM: Accelerated Ultra Low-Precision Inference on CPU Architectures using Lookup Tables [49.965024476651706]
DeepGEMM is a lookup table based approach for the execution of ultra low-precision convolutional neural networks on SIMD hardware. Our implementation outperforms corresponding 8-bit integer kernels by up to 1.74x on x86 platforms.
arXiv Detail & Related papers (2023-04-18T15:13:10Z)
Decomposition of Matrix Product States into Shallow Quantum Circuits [62.5210028594015]
tensor network (TN) algorithms can be mapped to parametrized quantum circuits (PQCs) We propose a new protocol for approximating TN states using realistic quantum circuits. Our results reveal one particular protocol, involving sequential growth and optimization of the quantum circuit, to outperform all other methods.
arXiv Detail & Related papers (2022-09-01T17:08:41Z)
Robust resource-efficient quantum variational ansatz through evolutionary algorithm [0.46180371154032895]
Vari quantum algorithms (VQAsational) are promising methods to demonstrate quantum advantage on near-term devices. We show that a fixed VQA circuit design, such as the widely-used hardware efficient ansatz, is not necessarily robust against imperfections. We propose a genome-length-adjustable evolutionary algorithm to design a robust VQA circuit that is optimized over variations of both circuit ansatz and gate parameters.
arXiv Detail & Related papers (2022-02-28T12:14:11Z)
VAQF: Fully Automatic Software-hardware Co-design Framework for Low-bit Vision Transformer [121.85581713299918]
We propose VAQF, a framework that builds inference accelerators on FPGA platforms for quantized Vision Transformers (ViTs) Given the model structure and the desired frame rate, VAQF will automatically output the required quantization precision for activations. This is the first time quantization has been incorporated into ViT acceleration on FPGAs.
arXiv Detail & Related papers (2022-01-17T20:27:52Z)
Scaling Quantum Approximate Optimization on Near-term Hardware [49.94954584453379]
We quantify scaling of the expected resource requirements by optimized circuits for hardware architectures with varying levels of connectivity. We show the number of measurements, and hence total time to synthesizing solution, grows exponentially in problem size and problem graph degree. These problems may be alleviated by increasing hardware connectivity or by recently proposed modifications to the QAOA that achieve higher performance with fewer circuit layers.
arXiv Detail & Related papers (2022-01-06T21:02:30Z)
SECDA: Efficient Hardware/Software Co-Design of FPGA-based DNN Accelerators for Edge Inference [0.0]
We propose SECDA, a new hardware/software co-design methodology to reduce design time of optimized Deep Neural Networks (DNN) inference accelerators on edge devices with FPGAs. We use SECDA to efficiently develop two different DNN accelerator designs on a PYNQ-Z1 board, a platform that includes an edge FPGA. We evaluate the two accelerator designs with four common DNN models, achieving an average performance speedup across models of up to 3.5$times$ with a 2.9$times$ reduction in energy consumption over CPU-only inference.
arXiv Detail & Related papers (2021-10-01T15:20:29Z)
Reconfigurable co-processor architecture with limited numerical precision to accelerate deep convolutional neural networks [0.38848561367220275]
Convolutional Neural Networks (CNNs) are widely used in deep learning applications, e.g. visual systems, robotics etc. Here, we present a model-independent reconfigurable co-processing architecture to accelerate CNNs. In contrast to existing solutions, we introduce limited precision 32 bit Q-format fixed point quantization for arithmetic representations and operations.
arXiv Detail & Related papers (2021-08-21T09:50:54Z)
EdgeBERT: Sentence-Level Energy Optimizations for Latency-Aware Multi-Task NLP Inference [82.1584439276834]
Transformer-based language models such as BERT provide significant accuracy improvement for a multitude of natural language processing (NLP) tasks. We present EdgeBERT, an in-depth algorithm- hardware co-design for latency-aware energy optimization for multi-task NLP.
arXiv Detail & Related papers (2020-11-28T19:21:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.