Low-Level and NUMA-Aware Optimization for High-Performance Quantum Simulation
- URL: http://arxiv.org/abs/2506.09198v2
- Date: Thu, 06 Nov 2025 13:38:24 GMT
- Title: Low-Level and NUMA-Aware Optimization for High-Performance Quantum Simulation
- Authors: Ali Rezaei, Luc Jaulmes, Maria Bahna, Oliver Thomson Brown, Antonio Barbalace,
- Abstract summary: This work focuses on performance enhancements through targeted low-level and NUMA-aware tuning on a single-node system.<n>We introduce an open-source, high-performance extension to the QuEST state vector simulator that integrates state-of-the-art low-level and NUMA-aware optimizations for modern processors.
- Score: 0.3280871442296501
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Scalable classical simulation of quantum circuits is crucial for advancing quantum algorithm development and validating emerging hardware. This work focuses on performance enhancements through targeted low-level and NUMA-aware tuning on a single-node system, thereby not only advancing the efficiency of classical quantum simulations but also establishing a foundation for scalable, heterogeneous implementations that bridge toward noiseless quantum computing. Although few prior studies have reported similar hardware-level optimizations, such implementations have not been released as open-source software, limiting independent validation and further development. We introduce an open-source, high-performance extension to the QuEST state vector simulator that integrates state-of-the-art low-level and NUMA-aware optimizations for modern processors. Our approach emphasizes locality-aware computation and incorporates hardware-specific techniques including NUMA-aware memory allocation, thread pinning, AVX-512 vectorization, aggressive loop unrolling, and explicit memory prefetching. Experiments demonstrate substantial speedups--5.5-6.5x for single-qubit gate operations, 4.5x for two-qubit gates, 4x for Random Quantum Circuits (RQC), and 1.8x for the Quantum Fourier Transform (QFT). Algorithmic workloads further achieve 4.3-4.6x acceleration for Grover and 2.5x for Shor-like circuits. These results show that systematic, architecture-aware tuning can significantly extend the practical simulation capacity of classical quantum simulators on current hardware.
Related papers
- Scalable parallel simulation of quantum circuits on CPU and GPU systems [9.62558654513992]
We present a comprehensive parallelization solution for the Q$2$Chemistry software package.<n>Our optimizations significantly enhance the simulation speed compared to unoptimized baselines.<n>These benchmarks highlight the capability of Q$2$Chemistry to effectively handle large-scale quantum simulations.
arXiv Detail & Related papers (2025-09-05T09:20:11Z) - VQC-MLPNet: An Unconventional Hybrid Quantum-Classical Architecture for Scalable and Robust Quantum Machine Learning [60.996803677584424]
Variational Quantum Circuits (VQCs) offer a novel pathway for quantum machine learning.<n>Their practical application is hindered by inherent limitations such as constrained linear expressivity, optimization challenges, and acute sensitivity to quantum hardware noise.<n>This work introduces VQC-MLPNet, a scalable and robust hybrid quantum-classical architecture designed to overcome these obstacles.
arXiv Detail & Related papers (2025-06-12T01:38:15Z) - QEA: An Accelerator for Quantum Circuit Simulation with Resources Efficiency and Flexibility [0.5359378066251386]
We introduce QEA, a state vector-based hardware accelerator that overcomes memory management, system adaptability, and execution efficiency difficulties.<n>We implement and evaluate QEA on the AMD Alveo U280 board, which uses only 0.534 W of power.<n> Experimental results show that QEA is extremely flexible, supporting a wide range of quantum circuits, has excellent fidelity, and outperforms powerful CPUs and related works up to 153.16x better in terms of normalized gate speed.
arXiv Detail & Related papers (2025-03-19T07:31:56Z) - A Superconducting Qubit-Resonator Quantum Processor with Effective All-to-All Connectivity [44.72199649564072]
This architecture can be used as a test-bed for algorithms that benefit from high connectivity.<n>We show that the central resonator can be used as a computational element.<n>We achieve a genuinely multi-qubit entangled Greenberger-Horne-Zeilinger (GHZ) state over all six qubits with a readout-error mitigated fidelity of $0.86$.
arXiv Detail & Related papers (2025-03-13T21:36:18Z) - Route-Forcing: Scalable Quantum Circuit Mapping for Scalable Quantum Computing Architectures [41.39072840772559]
Route-Forcing is a quantum circuit mapping algorithm that shows an average speedup of $3.7times$.
We present a quantum circuit mapping algorithm that shows an average speedup of $3.7times$ compared to the state-of-the-art scalable techniques.
arXiv Detail & Related papers (2024-07-24T14:21:41Z) - Queen: A quick, scalable, and comprehensive quantum circuit simulation for supercomputing [2.821829060100186]
We present an innovative quantum circuit simulation toolkit comprising gate optimization and simulation modules.
We achieve averaging 9 times speedup compared to state-of-the-art simulators, including QuEST, IBM-Aer, and NVIDIA-cuQuantum.
We believe the proposed toolkit paves the way for faster quantum circuit simulations, thereby facilitating the development of novel quantum algorithms.
arXiv Detail & Related papers (2024-06-20T08:00:41Z) - Parallel Quantum Computing Simulations via Quantum Accelerator Platform Virtualization [44.99833362998488]
We present a model for parallelizing simulation of quantum circuit executions.
The model can take advantage of its backend-agnostic features, enabling parallel quantum circuit execution over any target backend.
arXiv Detail & Related papers (2024-06-05T17:16:07Z) - Surrogate optimization of variational quantum circuits [1.0546736060336612]
Variational quantum eigensolvers are touted as a near-term algorithm capable of impacting many applications.
Finding algorithms and methods to improve convergence is important to accelerate the capabilities of near-term hardware for VQE.
arXiv Detail & Related papers (2024-04-03T18:00:00Z) - QuantumSEA: In-Time Sparse Exploration for Noise Adaptive Quantum
Circuits [82.50620782471485]
QuantumSEA is an in-time sparse exploration for noise-adaptive quantum circuits.
It aims to achieve two key objectives: (1) implicit circuits capacity during training and (2) noise robustness.
Our method establishes state-of-the-art results with only half the number of quantum gates and 2x time saving of circuit executions.
arXiv Detail & Related papers (2024-01-10T22:33:00Z) - Quantum Circuit Optimization through Iteratively Pre-Conditioned
Gradient Descent [0.4915744683251151]
iteratively preconditioned gradient descent (IPG) for optimizing quantum circuits and demonstrate performance speedups for state preparation and implementation of quantum algorithmics.
We show an improvement in fidelity by a factor of $104$ for preparing a 4-qubit W state and a maximally entangled 5-qubit GHZ state compared to other commonly used classicals tuning the same ansatz.
We also show gains for optimizing a unitary for a quantum Fourier transform using IPG, and report results of running such optimized circuits on IonQ's quantum processing unit (QPU)
arXiv Detail & Related papers (2023-09-18T17:30:03Z) - 2QAN: A quantum compiler for 2-local qubit Hamiltonian simulation
algorithms [0.76146285961466]
We develop a compiler, named 2QAN, to optimize quantum circuits for 2-local qubit Hamiltonian simulation problems.
2QAN can reduce the number of inserted SWAP gates by 11.5X, reduce overhead in hardware gate count by 68.5X, and reduce overhead in circuit depth by 21X.
arXiv Detail & Related papers (2021-08-04T15:03:47Z) - Variational Quantum Optimization with Multi-Basis Encodings [62.72309460291971]
We introduce a new variational quantum algorithm that benefits from two innovations: multi-basis graph complexity and nonlinear activation functions.
Our results in increased optimization performance, two increase in effective landscapes and a reduction in measurement progress.
arXiv Detail & Related papers (2021-06-24T20:16:02Z) - Accelerating variational quantum algorithms with multiple quantum
processors [78.36566711543476]
Variational quantum algorithms (VQAs) have the potential of utilizing near-term quantum machines to gain certain computational advantages.
Modern VQAs suffer from cumbersome computational overhead, hampered by the tradition of employing a solitary quantum processor to handle large data.
Here we devise an efficient distributed optimization scheme, called QUDIO, to address this issue.
arXiv Detail & Related papers (2021-06-24T08:18:42Z) - Tensor Network Quantum Virtual Machine for Simulating Quantum Circuits
at Exascale [57.84751206630535]
We present a modernized version of the Quantum Virtual Machine (TNQVM) which serves as a quantum circuit simulation backend in the e-scale ACCelerator (XACC) framework.
The new version is based on the general purpose, scalable network processing library, ExaTN, and provides multiple quantum circuit simulators.
By combining the portable XACC quantum processors and the scalable ExaTN backend we introduce an end-to-end virtual development environment which can scale from laptops to future exascale platforms.
arXiv Detail & Related papers (2021-04-21T13:26:42Z) - Quantum circuit architecture search for variational quantum algorithms [88.71725630554758]
We propose a resource and runtime efficient scheme termed quantum architecture search (QAS)
QAS automatically seeks a near-optimal ansatz to balance benefits and side-effects brought by adding more noisy quantum gates.
We implement QAS on both the numerical simulator and real quantum hardware, via the IBM cloud, to accomplish data classification and quantum chemistry tasks.
arXiv Detail & Related papers (2020-10-20T12:06:27Z) - Intel Quantum Simulator: A cloud-ready high-performance simulator of
quantum circuits [0.0]
We introduce the latest release of Intel Quantum Simulator (IQS), formerly known as qHiPSTER.
The high-performance computing capability of the software allows users to leverage the available hardware resources.
IQS allows to subdivide the computational resources to simulate a pool of related circuits in parallel.
arXiv Detail & Related papers (2020-01-28T19:00:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.