Accurate, Low-latency, Efficient SAR Automatic Target Recognition on
FPGA
- URL: http://arxiv.org/abs/2301.01454v1
- Date: Wed, 4 Jan 2023 05:35:30 GMT
- Title: Accurate, Low-latency, Efficient SAR Automatic Target Recognition on
FPGA
- Authors: Bingyi Zhang, Rajgopal Kannan, Viktor Prasanna, Carl Busart
- Abstract summary: Synthetic aperture radar (SAR) automatic target recognition (ATR) is the key technique for remote-sensing image recognition.
The state-of-the-art convolutional neural networks (CNNs) for SAR ATR suffer from emphhigh computation cost and emphlarge memory footprint.
We propose a comprehensive GNN-based model-architecture co-design on FPGA to address the above issues.
- Score: 3.251765107970636
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Synthetic aperture radar (SAR) automatic target recognition (ATR) is the key
technique for remote-sensing image recognition. The state-of-the-art
convolutional neural networks (CNNs) for SAR ATR suffer from \emph{high
computation cost} and \emph{large memory footprint}, making them unsuitable to
be deployed on resource-limited platforms, such as small/micro satellites. In
this paper, we propose a comprehensive GNN-based model-architecture {co-design}
on FPGA to address the above issues. \emph{Model design}: we design a novel
graph neural network (GNN) for SAR ATR. The proposed GNN model incorporates
GraphSAGE layer operators and attention mechanism, achieving comparable
accuracy as the state-of-the-art work with near $1/100$ computation cost. Then,
we propose a pruning approach including weight pruning and input pruning. While
weight pruning through lasso regression reduces most parameters without
accuracy drop, input pruning eliminates most input pixels with negligible
accuracy drop. \emph{Architecture design}: to fully unleash the computation
parallelism within the proposed model, we develop a novel unified hardware
architecture that can execute various computation kernels (feature aggregation,
feature transformation, graph pooling). The proposed hardware design adopts the
Scatter-Gather paradigm to efficiently handle the irregular computation
{patterns} of various computation kernels. We deploy the proposed design on an
embedded FPGA (AMD Xilinx ZCU104) and evaluate the performance using MSTAR
dataset. Compared with the state-of-the-art CNNs, the proposed GNN achieves
comparable accuracy with $1/3258$ computation cost and $1/83$ model size.
Compared with the state-of-the-art CPU/GPU, our FPGA accelerator achieves
$14.8\times$/$2.5\times$ speedup (latency) and is $62\times$/$39\times$ more
energy efficient.
Related papers
- ApproxDARTS: Differentiable Neural Architecture Search with Approximate Multipliers [0.24578723416255746]
We present ApproxDARTS, a neural architecture search (NAS) method enabling the popular differentiable neural architecture search method called DARTS to exploit approximate multipliers.
We show that the ApproxDARTS is able to perform a complete architecture search within less than $10$ GPU hours and produce competitive convolutional neural networks (CNN) containing approximate multipliers in convolutional layers.
arXiv Detail & Related papers (2024-04-08T09:54:57Z) - T-GAE: Transferable Graph Autoencoder for Network Alignment [79.89704126746204]
T-GAE is a graph autoencoder framework that leverages transferability and stability of GNNs to achieve efficient network alignment without retraining.
Our experiments demonstrate that T-GAE outperforms the state-of-the-art optimization method and the best GNN approach by up to 38.7% and 50.8%, respectively.
arXiv Detail & Related papers (2023-10-05T02:58:29Z) - Graph Neural Network for Accurate and Low-complexity SAR ATR [2.9766397696234996]
We propose a graph neural network (GNN) model to achieve accurate and low-latency SAR ATR.
The proposed GNN model has low computation complexity and achieves comparable high accuracy.
Compared with the state-of-the-art CNNs, the proposed GNN model has only 1/3000 computation cost and 1/80 model size.
arXiv Detail & Related papers (2023-05-11T20:17:41Z) - FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task.
The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources.
It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z) - Quantized Neural Networks via {-1, +1} Encoding Decomposition and
Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks.
We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z) - ANNETTE: Accurate Neural Network Execution Time Estimation with Stacked
Models [56.21470608621633]
We propose a time estimation framework to decouple the architectural search from the target hardware.
The proposed methodology extracts a set of models from micro- kernel and multi-layer benchmarks and generates a stacked model for mapping and network execution time estimation.
We compare estimation accuracy and fidelity of the generated mixed models, statistical models with the roofline model, and a refined roofline model for evaluation.
arXiv Detail & Related papers (2021-05-07T11:39:05Z) - VersaGNN: a Versatile accelerator for Graph neural networks [81.1667080640009]
We propose textitVersaGNN, an ultra-efficient, systolic-array-based versatile hardware accelerator.
textitVersaGNN achieves on average 3712$times$ speedup with 1301.25$times$ energy reduction on CPU, and 35.4$times$ speedup with 17.66$times$ energy reduction on GPU.
arXiv Detail & Related papers (2021-05-04T04:10:48Z) - HAO: Hardware-aware neural Architecture Optimization for Efficient
Inference [25.265181492143107]
We develop an integer programming algorithm to prune the design space of a neural network search algorithm.
Our algorithm achieves 72.5% top-1 accuracy on ImageNet at framerate 50, which is 60% faster than MnasNet and 135% faster than FBNet with comparable accuracy.
arXiv Detail & Related papers (2021-04-26T17:59:29Z) - NullaNet Tiny: Ultra-low-latency DNN Inference Through Fixed-function
Combinational Logic [4.119948826527649]
Field-programmable gate array (FPGA)-based accelerators are gaining traction as a serious contender to replace graphics processing unit/central processing unit-based platforms.
This paper presents NullaNet Tiny, a framework for constructing resource and energy-efficient, ultra-low-latency FPGA-based neural network accelerators.
arXiv Detail & Related papers (2021-04-07T00:16:39Z) - ExPAN(N)D: Exploring Posits for Efficient Artificial Neural Network
Design in FPGA-based Systems [4.2612881037640085]
This paper analyzes and ingathers the efficacy of the Posit number representation scheme and the efficiency of fixed-point arithmetic implementations for ANNs.
We propose a novel Posit to fixed-point converter for enabling high-performance and energy-efficient hardware implementations for ANNs.
arXiv Detail & Related papers (2020-10-24T11:02:25Z) - PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with
Pattern-based Weight Pruning [57.20262984116752]
We introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space.
With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency.
arXiv Detail & Related papers (2020-01-01T04:52:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.