Related papers: Efficient FPGA-accelerated Convolutional Neural Networks for Cloud Detection on CubeSats

Efficient FPGA-accelerated Convolutional Neural Networks for Cloud Detection on CubeSats

URL: http://arxiv.org/abs/2504.03891v1
Date: Fri, 04 Apr 2025 19:32:47 GMT
Title: Efficient FPGA-accelerated Convolutional Neural Networks for Cloud Detection on CubeSats
Authors: Angela Cratere, M. Salim Farissi, Andrea Carbone, Marcello Asciolla, Maria Rizzi, Francesco Dell'Olio, Augusto Nascetti, Dario Spiller,
Abstract summary: We present the implementation of four FPGA-accelerated convolutional neural network (CNN) models for onboard cloud detection in resource-constrained CubeSat missions.<n>This study explores both pixel-wise (Pixel-Net and Patch-Net) and image-wise (U-Net and Scene-Net) models to benchmark trade-offs in accuracy, latency, and model complexity.<n>All models retained high accuracy post-FPGA integration, with a cumulative maximum accuracy drop of only 0.6% after quantization and pruning.
Score: 0.5420492913071214
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present the implementation of four FPGA-accelerated convolutional neural network (CNN) models for onboard cloud detection in resource-constrained CubeSat missions, leveraging Xilinx's Vitis AI (VAI) framework and Deep Learning Processing Unit (DPU), a programmable engine with pre-implemented, parameterizable IP cores optimized for deep neural networks, on a Zynq UltraScale+ MPSoC. This study explores both pixel-wise (Pixel-Net and Patch-Net) and image-wise (U-Net and Scene-Net) models to benchmark trade-offs in accuracy, latency, and model complexity. Applying channel pruning, we achieved substantial reductions in model parameters (up to 98.6%) and floating-point operations (up to 90.7%) with minimal accuracy loss. Furthermore, the VAI tool was used to quantize the models to 8-bit precision, ensuring optimized hardware performance with negligible impact on accuracy. All models retained high accuracy post-FPGA integration, with a cumulative maximum accuracy drop of only 0.6% after quantization and pruning. The image-wise Scene-Net and U-Net models demonstrated strong real-time inference capabilities, achieving frame rates per second of 57.14 and 37.45, respectively, with power consumption of around 2.5 W, surpassing state-of-the-art onboard cloud detection solutions. Our approach underscores the potential of DPU-based hardware accelerators to expand the processing capabilities of small satellites, enabling efficient and flexible onboard CNN-based applications.

Related papers

Hardware-Software Co-optimised Fast and Accurate Deep Reconfigurable Spiking Inference Accelerator Architecture Design Methodology [2.968768532937366]
Spiking Neural Networks (SNNs) have emerged as a promising approach to improve the energy efficiency of machine learning models. We develop a hardware-software co-optimisation strategy to port software-trained deep neural networks (DNN) to reduced-precision spiking models.
arXiv Detail & Related papers (2024-10-07T05:04:13Z)
Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for Vision Transformers [56.37495946212932]
Vision transformers (ViTs) have demonstrated their superior accuracy for computer vision tasks compared to convolutional neural networks (CNNs) This work proposes Quasar-ViT, a hardware-oriented quantization-aware architecture search framework for ViTs.
arXiv Detail & Related papers (2024-07-25T16:35:46Z)
On-Chip Hardware-Aware Quantization for Mixed Precision Neural Networks [52.97107229149988]
We propose an On-Chip Hardware-Aware Quantization framework, performing hardware-aware mixed-precision quantization on deployed edge devices. For efficiency metrics, we built an On-Chip Quantization Aware pipeline, which allows the quantization process to perceive the actual hardware efficiency of the quantization operator. For accuracy metrics, we propose Mask-Guided Quantization Estimation technology to effectively estimate the accuracy impact of operators in the on-chip scenario.
arXiv Detail & Related papers (2023-09-05T04:39:34Z)
SCONNA: A Stochastic Computing Based Optical Accelerator for Ultra-Fast, Energy-Efficient Inference of Integer-Quantized CNNs [0.0]
A CNN inference task uses convolution operations that are typically transformed into vector-dot-product (VDP) operations. Several photonic microring resonators (MRRs) based hardware architectures have been proposed to accelerate integer-quantized CNNs. Existing photonic MRR-based analog accelerators exhibit a very strong trade-off between the achievable input/weight precision and VDP operation size.
arXiv Detail & Related papers (2023-02-14T13:35:15Z)
Design of High-Throughput Mixed-Precision CNN Accelerators on FPGA [0.0]
Layer-wise mixed-precision quantization allows for more efficient results while inflating the design space. We present an in-depth quantitative methodology to efficiently explore the design space considering the limited hardware resources of a given FPGA. Our resulting hardware accelerators implement truly mixed-precision operations that enable efficient execution of layer-wise and channel-wise quantized CNNs.
arXiv Detail & Related papers (2022-08-09T15:32:51Z)
FxP-QNet: A Post-Training Quantizer for the Design of Mixed Low-Precision DNNs with Dynamic Fixed-Point Representation [2.4149105714758545]
We propose a novel framework referred to as the Fixed-Point Quantizer of deep neural Networks (FxP-QNet) FxP-QNet adapts the quantization level for each data-structure of each layer based on the trade-off between the network accuracy and the low-precision requirements. Results show that FxP-QNet-quantized AlexNet, VGG-16, and ResNet-18 reduce the overall memory requirements of their full-precision counterparts by 7.16x, 10.36x, and 6.44x with less than 0.95%, 0.95%, and 1.99%
arXiv Detail & Related papers (2022-03-22T23:01:43Z)
FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task. The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources. It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z)
ANNETTE: Accurate Neural Network Execution Time Estimation with Stacked Models [56.21470608621633]
We propose a time estimation framework to decouple the architectural search from the target hardware. The proposed methodology extracts a set of models from micro- kernel and multi-layer benchmarks and generates a stacked model for mapping and network execution time estimation. We compare estimation accuracy and fidelity of the generated mixed models, statistical models with the roofline model, and a refined roofline model for evaluation.
arXiv Detail & Related papers (2021-05-07T11:39:05Z)
Post-training deep neural network pruning via layer-wise calibration [70.65691136625514]
We propose a data-free extension of the approach for computer vision models based on automatically-generated synthetic fractal images. When using real data, we are able to get a ResNet50 model on ImageNet with 65% sparsity rate in 8-bit precision in a post-training setting.
arXiv Detail & Related papers (2021-04-30T14:20:51Z)
HAO: Hardware-aware neural Architecture Optimization for Efficient Inference [25.265181492143107]
We develop an integer programming algorithm to prune the design space of a neural network search algorithm. Our algorithm achieves 72.5% top-1 accuracy on ImageNet at framerate 50, which is 60% faster than MnasNet and 135% faster than FBNet with comparable accuracy.
arXiv Detail & Related papers (2021-04-26T17:59:29Z)
Widening and Squeezing: Towards Accurate and Efficient QNNs [125.172220129257]
Quantization neural networks (QNNs) are very attractive to the industry because their extremely cheap calculation and storage overhead, but their performance is still worse than that of networks with full-precision parameters. Most of existing methods aim to enhance performance of QNNs especially binary neural networks by exploiting more effective training techniques. We address this problem by projecting features in original full-precision networks to high-dimensional quantization features.
arXiv Detail & Related papers (2020-02-03T04:11:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.