Related papers: Accelerating Recurrent Neural Networks for Gravitational Wave Experiments

Accelerating Recurrent Neural Networks for Gravitational Wave Experiments

URL: http://arxiv.org/abs/2106.14089v1
Date: Sat, 26 Jun 2021 20:44:02 GMT
Title: Accelerating Recurrent Neural Networks for Gravitational Wave Experiments
Authors: Zhiqiang Que, Erwei Wang, Umar Marikar, Eric Moreno, Jennifer Ngadiuba, Hamza Javed, Bart{\l}omiej Borzyszkowski, Thea Aarrestad, Vladimir Loncar, Sioni Summers, Maurizio Pierini, Peter Y Cheung, Wayne Luk
Abstract summary: We have developed a new architecture capable of accelerating RNN inference for analyzing time-series data from LIGO detectors. A customizable template for this architecture has been designed, which enables the generation of low-latency FPGA designs.
Score: 1.9263019320519579
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper presents novel reconfigurable architectures for reducing the latency of recurrent neural networks (RNNs) that are used for detecting gravitational waves. Gravitational interferometers such as the LIGO detectors capture cosmic events such as black hole mergers which happen at unknown times and of varying durations, producing time-series data. We have developed a new architecture capable of accelerating RNN inference for analyzing time-series data from LIGO detectors. This architecture is based on optimizing the initiation intervals (II) in a multi-layer LSTM (Long Short-Term Memory) network, by identifying appropriate reuse factors for each layer. A customizable template for this architecture has been designed, which enables the generation of low-latency FPGA designs with efficient resource utilization using high-level synthesis tools. The proposed approach has been evaluated based on two LSTM models, targeting a ZYNQ 7045 FPGA and a U250 FPGA. Experimental results show that with balanced II, the number of DSPs can be reduced up to 42% while achieving the same IIs. When compared to other FPGA-based LSTM designs, our design can achieve about 4.92 to 12.4 times lower latency.

Related papers

rule4ml: An Open-Source Tool for Resource Utilization and Latency Estimation for ML Models on FPGA [0.0]
This paper introduces a novel method to predict the resource utilization and inference latency of Neural Networks (NNs) before their synthesis and implementation on FPGA. We leverage HLS4ML, a tool-flow that helps translate NNs into high-level synthesis (HLS) code. Our method uses trained regression models for immediate pre-synthesis predictions.
arXiv Detail & Related papers (2024-08-09T19:35:10Z)
Coupling a Recurrent Neural Network to SPAD TCSPC Systems for Real-time Fluorescence Lifetime Imaging [4.49533352963549]
Fluorescence lifetime imaging (FLI) has been receiving increased attention in recent years as a powerful diagnostic technique in biological and medical research. Existing FLI systems often suffer from a tradeoff between processing speed, accuracy, and robustness. We propose a robust approach that enables fast FLI with no degradation of accuracy.
arXiv Detail & Related papers (2023-06-27T16:37:37Z)
Implementing Neural Network-Based Equalizers in a Coherent Optical Transmission System Using Field-Programmable Gate Arrays [3.1543509940301946]
We show the offline FPGA realization of both recurrent and feedforward neural network (NN)-based equalizers for nonlinearity compensation in coherent optical transmission systems. The main results are divided into three parts: a performance comparison, an analysis of how activation functions are implemented, and a report on the complexity of the hardware.
arXiv Detail & Related papers (2022-12-09T07:28:45Z)
LL-GNN: Low Latency Graph Neural Networks on FPGAs for High Energy Physics [45.666822327616046]
This work presents a novel reconfigurable architecture for Low Graph Neural Network (LL-GNN) designs for particle detectors. The LL-GNN design advances the next generation of trigger systems by enabling sophisticated algorithms to process experimental data efficiently.
arXiv Detail & Related papers (2022-09-28T12:55:35Z)
Ultra-low latency recurrent neural network inference on FPGAs for physics applications with hls4ml [8.085746138965975]
We present an implementation of two types of recurrent neural network layers -- long short-term memory and gated recurrent unit -- within the hls4ml framework. We show the performance and synthesized designs for multiple neural networks, many of which are trained specifically for jet identification tasks at the CERN Large Hadron Collider.
arXiv Detail & Related papers (2022-07-01T17:19:24Z)
ANNETTE: Accurate Neural Network Execution Time Estimation with Stacked Models [56.21470608621633]
We propose a time estimation framework to decouple the architectural search from the target hardware. The proposed methodology extracts a set of models from micro- kernel and multi-layer benchmarks and generates a stacked model for mapping and network execution time estimation. We compare estimation accuracy and fidelity of the generated mixed models, statistical models with the roofline model, and a refined roofline model for evaluation.
arXiv Detail & Related papers (2021-05-07T11:39:05Z)
Deep Cellular Recurrent Network for Efficient Analysis of Time-Series Data with Spatial Information [52.635997570873194]
This work proposes a novel deep cellular recurrent neural network (DCRNN) architecture to process complex multi-dimensional time series data with spatial information. The proposed architecture achieves state-of-the-art performance while utilizing substantially less trainable parameters when compared to comparable methods in the literature.
arXiv Detail & Related papers (2021-01-12T20:08:18Z)
Neural Architecture Search For LF-MMI Trained Time Delay Neural Networks [61.76338096980383]
A range of neural architecture search (NAS) techniques are used to automatically learn two types of hyper- parameters of state-of-the-art factored time delay neural networks (TDNNs) These include the DARTS method integrating architecture selection with lattice-free MMI (LF-MMI) TDNN training. Experiments conducted on a 300-hour Switchboard corpus suggest the auto-configured systems consistently outperform the baseline LF-MMI TDNN systems.
arXiv Detail & Related papers (2020-07-17T08:32:11Z)
Object Tracking through Residual and Dense LSTMs [67.98948222599849]
Deep learning-based trackers based on LSTMs (Long Short-Term Memory) recurrent neural networks have emerged as a powerful alternative. DenseLSTMs outperform Residual and regular LSTM, and offer a higher resilience to nuisances. Our case study supports the adoption of residual-based RNNs for enhancing the robustness of other trackers.
arXiv Detail & Related papers (2020-06-22T08:20:17Z)
A Generative Learning Approach for Spatio-temporal Modeling in Connected Vehicular Network [55.852401381113786]
This paper proposes LaMI (Latency Model Inpainting), a novel framework to generate a comprehensive-temporal quality framework for wireless access latency of connected vehicles. LaMI adopts the idea from image inpainting and synthesizing and can reconstruct the missing latency samples by a two-step procedure. In particular, it first discovers the spatial correlation between samples collected in various regions using a patching-based approach and then feeds the original and highly correlated samples into a Varienational Autocoder (VAE)
arXiv Detail & Related papers (2020-03-16T03:43:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.