Related papers: Combining processing throughput, low latency and timing accuracy in experiment control

Combining processing throughput, low latency and timing accuracy in experiment control

URL: http://arxiv.org/abs/2111.15290v1
Date: Tue, 30 Nov 2021 11:11:02 GMT
Title: Combining processing throughput, low latency and timing accuracy in experiment control
Authors: Chun Kit Lam, Stephan Maka, David Nadlinger, Chris Ballance and S\'ebastien Bourdeauducq
Abstract summary: We ported the firmware of the ARTIQ experiment control infrastructure to an embedded system based on a commercial Xilinx Zynq-7000 system-on-chip. It contains high-performance hardwired CPU cores integrated with FPGA fabric.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We ported the firmware of the ARTIQ experiment control infrastructure to an embedded system based on a commercial Xilinx Zynq-7000 system-on-chip. It contains high-performance hardwired CPU cores integrated with FPGA fabric. As with previous ARTIQ systems, the FPGA fabric is responsible for timing all I/O signals to and from peripherals, thereby retaining the exquisite precision required by most quantum physics experiments. A significant amount of latency is incurred by the hardwired interface between the CPU core and FPGA fabric of the Zynq-7000 chip; creative use of the CPU's cache-coherent accelerator ports and the CPU's event flag allowed us to reduce this latency and achieve better I/O performance than previous ARTIQ systems. The performance of the hardwired CPU core, in particular when floating-point computation is involved, greatly exceeds that of previous ARTIQ systems based on a softcore CPU. This makes it interesting to execute intensive computations on the embedded system, with a low-latency path to the experiment. We extended the ARTIQ compiler so that many mathematical functions and matrix operations can be programmed by the user, using the familiar NumPy syntax.

Related papers

Q-GEAR: Improving quantum simulation framework [0.28402080392117757]
We introduce Q-Gear, a software framework that transforms Qiskit quantum circuits into Cuda-Q kernels. Q-Gear accelerates both CPU and GPU based simulations by respectively two orders of magnitude and ten times with minimal coding effort.
arXiv Detail & Related papers (2025-04-04T22:17:51Z)
FAMOUS: Flexible Accelerator for the Attention Mechanism of Transformer on UltraScale+ FPGAs [0.0]
Transformer neural networks (TNNs) are being applied across a widening range of application domains, including natural language processing (NLP), machine translation, and computer vision (CV) This paper proposes textitFAMOUS, a flexible hardware accelerator for dense multi-head attention computation of TNNs on field-programmable gate arrays (FPGAs) It is optimized for high utilization of processing elements and on-chip memories to improve parallelism and reduce latency.
arXiv Detail & Related papers (2024-09-21T05:25:46Z)
Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for Vision Transformers [56.37495946212932]
Vision transformers (ViTs) have demonstrated their superior accuracy for computer vision tasks compared to convolutional neural networks (CNNs) This work proposes Quasar-ViT, a hardware-oriented quantization-aware architecture search framework for ViTs.
arXiv Detail & Related papers (2024-07-25T16:35:46Z)
AdaLog: Post-Training Quantization for Vision Transformers with Adaptive Logarithm Quantizer [54.713778961605115]
Vision Transformer (ViT) has become one of the most prevailing fundamental backbone networks in the computer vision community. We propose a novel non-uniform quantizer, dubbed the Adaptive Logarithm AdaLog (AdaLog) quantizer.
arXiv Detail & Related papers (2024-07-17T18:38:48Z)
Quantum Compiling with Reinforcement Learning on a Superconducting Processor [55.135709564322624]
We develop a reinforcement learning-based quantum compiler for a superconducting processor. We demonstrate its capability of discovering novel and hardware-amenable circuits with short lengths. Our study exemplifies the codesign of the software with hardware for efficient quantum compilation.
arXiv Detail & Related papers (2024-06-18T01:49:48Z)
Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures [67.47328776279204]
This work introduces a framework to develop efficient, portable Deep Learning and High Performance Computing kernels. We decompose the kernel development in two steps: 1) Expressing the computational core using Processing Primitives (TPPs) and 2) Expressing the logical loops around TPPs in a high-level, declarative fashion. We demonstrate the efficacy of our approach using standalone kernels and end-to-end workloads that outperform state-of-the-art implementations on diverse CPU platforms.
arXiv Detail & Related papers (2023-04-25T05:04:44Z)
MAPLE-X: Latency Prediction with Explicit Microprocessor Prior Knowledge [87.41163540910854]
Deep neural network (DNN) latency characterization is a time-consuming process. We propose MAPLE-X which extends MAPLE by incorporating explicit prior knowledge of hardware devices and DNN architecture latency.
arXiv Detail & Related papers (2022-05-25T11:08:20Z)
Towards real-time and energy efficient Siamese tracking -- a hardware-software approach [0.0]
We propose a hardware-software implementation of the well-known fully connected Siamese tracker (SiamFC) We have developed a quantised Siamese network for the FINN accelerator, using algorithm-accelerator co-design, and performed design space exploration. For our network, running in the programmable logic part of the Zynq UltraScale+ MPSoC ZCU104, we achieved the processing of almost 50 frames-per-second with tracker accuracy on par with its floating point counterpart.
arXiv Detail & Related papers (2022-05-21T18:31:07Z)
Accelerating variational quantum algorithms with multiple quantum processors [78.36566711543476]
Variational quantum algorithms (VQAs) have the potential of utilizing near-term quantum machines to gain certain computational advantages. Modern VQAs suffer from cumbersome computational overhead, hampered by the tradition of employing a solitary quantum processor to handle large data. Here we devise an efficient distributed optimization scheme, called QUDIO, to address this issue.
arXiv Detail & Related papers (2021-06-24T08:18:42Z)
QubiC: An open source FPGA-based control and measurement system for superconducting quantum information processors [5.310385728746101]
We design a modular FPGA based system called QubiC to control and measure a superconducting quantum processing unit. A prototype hardware module is assembled from several commercial off-the-shelf evaluation boards and in-house developed circuit boards. System functionality and performance are demonstrated by performing qubit chip characterization, gate optimization, and randomized benchmarking sequences.
arXiv Detail & Related papers (2020-12-31T21:06:28Z)
Accelerated Charged Particle Tracking with Graph Neural Networks on FPGAs [0.0]
We develop and study FPGA implementations of algorithms for charged particle tracking based on graph neural networks. We find a considerable speedup over CPU-based execution is possible, potentially enabling such algorithms to be used effectively in future computing.
arXiv Detail & Related papers (2020-11-30T18:17:43Z)
Accelerating complex control schemes on a heterogeneous MPSoC platform for quantum computing [1.1744028458220428]
Control and readout of superconducting quantum bits (qubits) require microwave pulses with gigahertz frequencies and nanosecond precision. To generate and analyze these microwave pulses, we developed a versatile FPGA-based electronics platform. We present the architecture of the Taskrunner framework as well as timing benchmarks and discuss applications in the field of quantum computing.
arXiv Detail & Related papers (2020-04-16T16:48:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.