Real-time semantic segmentation on FPGAs for autonomous vehicles with
hls4ml
- URL: http://arxiv.org/abs/2205.07690v1
- Date: Mon, 16 May 2022 13:55:16 GMT
- Title: Real-time semantic segmentation on FPGAs for autonomous vehicles with
hls4ml
- Authors: Nicol\`o Ghielmetti, Vladimir Loncar, Maurizio Pierini, Marcel Roed,
Sioni Summers, Thea Aarrestad, Christoffer Petersson, Hampus Linander,
Jennifer Ngadiuba, Kelvin Lin, Philip Harris
- Abstract summary: We show how field programmable gate arrays can serve as hardware accelerators for real-time semantic segmentation tasks relevant for autonomous driving.
Considering compressed versions of the ENet convolutional neural network architecture, we demonstrate a fully-on-chip deployment with a latency of 4.9 ms per image.
We show, through aggressive filter reduction and heterogeneous quantization-aware training, and an optimized implementation of convolutional layers, that the power consumption and resource utilization can be significantly reduced.
- Score: 6.223322030008291
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we investigate how field programmable gate arrays can serve as
hardware accelerators for real-time semantic segmentation tasks relevant for
autonomous driving. Considering compressed versions of the ENet convolutional
neural network architecture, we demonstrate a fully-on-chip deployment with a
latency of 4.9 ms per image, using less than 30% of the available resources on
a Xilinx ZCU102 evaluation board. The latency is reduced to 3 ms per image when
increasing the batch size to ten, corresponding to the use case where the
autonomous vehicle receives inputs from multiple cameras simultaneously. We
show, through aggressive filter reduction and heterogeneous quantization-aware
training, and an optimized implementation of convolutional layers, that the
power consumption and resource utilization can be significantly reduced while
maintaining accuracy on the Cityscapes dataset.
Related papers
- Efficient Multi-Camera Tokenization with Triplanes for End-to-End Driving [33.2092963387255]
Autoregressive Transformers are increasingly being deployed as end-to-end robot and autonomous vehicle (AV) policy architectures.<n>We present an efficient triplane-based multi-camera tokenization strategy that leverages recent advances in 3D neural reconstruction and rendering.<n> Experiments on a large-scale AV dataset and state-of-the-art neural simulator demonstrate that our approach yields significant savings over current image patch-based tokenization strategies.
arXiv Detail & Related papers (2025-06-13T21:56:52Z) - FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency.
We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs)
We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z) - Unlocking Real-Time Fluorescence Lifetime Imaging: Multi-Pixel Parallelism for FPGA-Accelerated Processing [2.369919866595525]
We propose a method to achieve real-time FLI using an FPGA-based hardware accelerator.
We implement a GRU-based sequence-to-sequence (Seq2Seq) model on an FPGA board compatible with time-resolved cameras.
By integrating a GRU-based Seq2Seq model and its compressed version, called Seq2SeqLite, we were able to process multiple pixels in parallel, reducing latency compared to sequential processing.
arXiv Detail & Related papers (2024-10-09T18:24:23Z) - Compressing Recurrent Neural Networks for FPGA-accelerated Implementation in Fluorescence Lifetime Imaging [3.502427552446068]
Deep learning models enable real-time inference, but can be computationally demanding due to complex architectures and large matrix operations.
This makes DL models ill-suited for direct implementation on field-programmable gate array (FPGA)-based camera hardware.
In this work, we focus on compressing recurrent neural networks (RNNs), which are well-suited for FLI time-series data processing, to enable deployment on resource-constrained FPGA boards.
arXiv Detail & Related papers (2024-10-01T17:23:26Z) - LAPTNet-FPN: Multi-scale LiDAR-aided Projective Transform Network for
Real Time Semantic Grid Prediction [0.0]
By fusing information from multiple sensors, robustness can be increased and the computational load for the task can be lowered.
Our multi-scale LiDAR-Aided Perspective Transform network uses information available in point clouds to guide the projection of image features to a top-view representation.
arXiv Detail & Related papers (2023-02-10T12:34:28Z) - UNETR++: Delving into Efficient and Accurate 3D Medical Image Segmentation [93.88170217725805]
We propose a 3D medical image segmentation approach, named UNETR++, that offers both high-quality segmentation masks as well as efficiency in terms of parameters, compute cost, and inference speed.
The core of our design is the introduction of a novel efficient paired attention (EPA) block that efficiently learns spatial and channel-wise discriminative features.
Our evaluations on five benchmarks, Synapse, BTCV, ACDC, BRaTs, and Decathlon-Lung, reveal the effectiveness of our contributions in terms of both efficiency and accuracy.
arXiv Detail & Related papers (2022-12-08T18:59:57Z) - FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task.
The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources.
It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z) - AdaViT: Adaptive Tokens for Efficient Vision Transformer [91.88404546243113]
We introduce AdaViT, a method that adaptively adjusts the inference cost of vision transformer (ViT) for images of different complexity.
AdaViT achieves this by automatically reducing the number of tokens in vision transformers that are processed in the network as inference proceeds.
arXiv Detail & Related papers (2021-12-14T18:56:07Z) - High-speed object detection with a single-photon time-of-flight image
sensor [2.648554238948439]
We present results from a portable SPAD camera system that outputs 16-bin photon timing histograms with 64x32 spatial resolution.
The results are relevant for safety-critical computer vision applications which would benefit from faster than human reaction times.
arXiv Detail & Related papers (2021-07-28T14:53:44Z) - Multi-Exit Semantic Segmentation Networks [78.44441236864057]
We propose a framework for converting state-of-the-art segmentation models to MESS networks.
specially trained CNNs that employ parametrised early exits along their depth to save during inference on easier samples.
We co-optimise the number, placement and architecture of the attached segmentation heads, along with the exit policy, to adapt to the device capabilities and application-specific requirements.
arXiv Detail & Related papers (2021-06-07T11:37:03Z) - Fast convolutional neural networks on FPGAs with hls4ml [0.22756183402372013]
We introduce an automated tool for deploying ultra low-latency, low-power deep neural networks on FPGAs.
We demonstrate how to achieve inference latency of $5,mu$s using convolutional architectures, while preserving state-of-the-art model performance.
arXiv Detail & Related papers (2021-01-13T14:47:11Z) - Binary DAD-Net: Binarized Driveable Area Detection Network for
Autonomous Driving [94.40107679615618]
This paper proposes a novel binarized driveable area detection network (binary DAD-Net)
It uses only binary weights and activations in the encoder, the bottleneck, and the decoder part.
It outperforms state-of-the-art semantic segmentation networks on public datasets.
arXiv Detail & Related papers (2020-06-15T07:09:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.