Related papers: TrapSIMD: SIMD-Aware Compiler Optimization for 2D Trapped-Ion Quantum Machines

TrapSIMD: SIMD-Aware Compiler Optimization for 2D Trapped-Ion Quantum Machines

URL: http://arxiv.org/abs/2504.17886v2
Date: Mon, 28 Apr 2025 16:45:54 GMT
Title: TrapSIMD: SIMD-Aware Compiler Optimization for 2D Trapped-Ion Quantum Machines
Authors: Jixuan Ruan, Hezi Zhang, Xiang Fang, Ang Li, Wesley C. Campbell, Eric Hudson, David Hayes, Hartmut Haeffner, Travis Humble, Jens Palsberg, Yufei Ding,
Abstract summary: We present FluxTrap, a SIMD-aware compiler framework that establishes a hardware-software co-design interface for TI systems.<n>F FluxTrap reduces execution time by up to $3.82 times$ and improves fidelity by several orders of magnitude.
Score: 14.239863509836864
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Modular trapped-ion (TI) architectures offer a scalable quantum computing (QC) platform, with native transport behaviors that closely resemble the Single Instruction Multiple Data (SIMD) paradigm. We present FluxTrap, a SIMD-aware compiler framework that establishes a hardware-software co-design interface for TI systems. FluxTrap introduces a novel abstraction that unifies SIMD-style instructions -- including segmented intra-trap shift SIMD (S3) and global junction transfer SIMD (JT-SIMD) operations -- with a SIMD-enriched architectural graph, capturing key features such as transport synchronization, gate-zone locality, and topological constraints. It applies two passes -- SIMD aggregation and scheduling -- to coordinate grouped ion transport and gate execution within architectural constraints. On NISQ benchmarks, FluxTrap reduces execution time by up to $3.82 \times$ and improves fidelity by several orders of magnitude. It also scales to fault-tolerant workloads under diverse hardware configurations, providing feedback for future TI hardware design.

Related papers

Ladder-residual: parallelism-aware architecture for accelerating large model inference with communication overlapping [36.71999572939612]
We introduce Ladder Residual, a simple architectural modification applicable to all residual-based models.<n>Applying Ladder Residual to all its layers can achieve 29% end-to-end wall clock speed up at inference time with TP sharding over 8 devices.<n>We train a 1B and 3B Ladder Transformer from scratch and observe comparable performance to a standard dense transformer baseline.
arXiv Detail & Related papers (2025-01-11T17:06:30Z)
MICSim: A Modular Simulator for Mixed-signal Compute-in-Memory based AI Accelerator [10.65687190002229]
This work introduces MICSim, an open-source, pre-circuit simulator designed for evaluation of chip-level software performance and hardware overhead of mixed-signal compute-in-memory (CIM) accelerators. MICSim features a modular design, allowing easy multi-level co-design and design space exploration.
arXiv Detail & Related papers (2024-09-23T09:12:46Z)
Designing and Implementing a Generator Framework for a SIMD Abstraction Library [53.84310825081338]
We present TSLGen, a novel end-to-end framework for generating an SIMD abstraction library. We show that our framework is comparable to existing libraries, and we achieve the same performance results.
arXiv Detail & Related papers (2024-07-26T13:25:38Z)
Tao: Re-Thinking DL-based Microarchitecture Simulation [8.501776613988484]
Existing microarchitecture simulators excel and fall short at different aspects. Deep learning (DL)-based simulations are remarkably fast and have acceptable accuracy but fail to provide adequate low-level microarchitectural performance metrics. This paper introduces TAO that redesigns the DL-based simulation with three primary contributions.
arXiv Detail & Related papers (2024-04-16T21:45:10Z)
Communication-Efficient Framework for Distributed Image Semantic Wireless Transmission [68.69108124451263]
Federated learning-based semantic communication (FLSC) framework for multi-task distributed image transmission with IoT devices. Each link is composed of a hierarchical vision transformer (HVT)-based extractor and a task-adaptive translator. Channel state information-based multiple-input multiple-output transmission module designed to combat channel fading and noise.
arXiv Detail & Related papers (2023-08-07T16:32:14Z)
Joint Spatial-Temporal and Appearance Modeling with Transformer for Multiple Object Tracking [59.79252390626194]
We propose a novel solution named TransSTAM, which leverages Transformer to model both the appearance features of each object and the spatial-temporal relationships among objects. The proposed method is evaluated on multiple public benchmarks including MOT16, MOT17, and MOT20, and it achieves a clear performance improvement in both IDF1 and HOTA.
arXiv Detail & Related papers (2022-05-31T01:19:18Z)
Compiler-Driven Simulation of Reconfigurable Hardware Accelerators [0.8807375890824978]
Existing simulators tend to two extremes: low-level and general approaches, such as RTL simulation, that can model any hardware but require substantial effort and long execution times. This work proposes a compiler-driven simulation workflow that can model hardware accelerator.
arXiv Detail & Related papers (2022-02-01T20:31:04Z)
Parallel Simulation of Quantum Networks with Distributed Quantum State Management [56.24769206561207]
We identify requirements for parallel simulation of quantum networks and develop the first parallel discrete event quantum network simulator. Our contributions include the design and development of a quantum state manager that maintains shared quantum information distributed across multiple processes. We release the parallel SeQUeNCe simulator as an open-source tool alongside the existing sequential version.
arXiv Detail & Related papers (2021-11-06T16:51:17Z)
SensiX++: Bringing MLOPs and Multi-tenant Model Serving to Sensory Edge Devices [69.1412199244903]
We present a multi-tenant runtime for adaptive model execution with integrated MLOps on edge devices, e.g., a camera, a microphone, or IoT sensors. S SensiX++ operates on two fundamental principles - highly modular componentisation to externalise data operations with clear abstractions and document-centric manifestation for system-wide orchestration. We report on the overall throughput and quantified benefits of various automation components of SensiX++ and demonstrate its efficacy to significantly reduce operational complexity and lower the effort to deploy, upgrade, reconfigure and serve embedded models on edge devices.
arXiv Detail & Related papers (2021-09-08T22:06:16Z)
TransMOT: Spatial-Temporal Graph Transformer for Multiple Object Tracking [74.82415271960315]
We propose a solution named TransMOT to efficiently model the spatial and temporal interactions among objects in a video. TransMOT is not only more computationally efficient than the traditional Transformer, but it also achieves better tracking accuracy. The proposed method is evaluated on multiple benchmark datasets including MOT15, MOT16, MOT17, and MOT20.
arXiv Detail & Related papers (2021-04-01T01:49:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.