VTR: An Optimized Vision Transformer for SAR ATR Acceleration on FPGA
- URL: http://arxiv.org/abs/2404.04527v1
- Date: Sat, 6 Apr 2024 06:49:55 GMT
- Title: VTR: An Optimized Vision Transformer for SAR ATR Acceleration on FPGA
- Authors: Sachini Wickramasinghe, Dhruv Parikh, Bingyi Zhang, Rajgopal Kannan, Viktor Prasanna, Carl Busart,
- Abstract summary: Vision Transformers (ViTs) are the current state-of-the-art in various computer vision applications.
We develop a lightweight ViT model that can be trained directly on small datasets without any pre-training.
We evaluate our proposed model, that we call VTR (ViT for SAR ATR) on three widely used SAR datasets.
- Score: 2.8595179027282907
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Synthetic Aperture Radar (SAR) Automatic Target Recognition (ATR) is a key technique used in military applications like remote-sensing image recognition. Vision Transformers (ViTs) are the current state-of-the-art in various computer vision applications, outperforming their CNN counterparts. However, using ViTs for SAR ATR applications is challenging due to (1) standard ViTs require extensive training data to generalize well due to their low locality; the standard SAR datasets, however, have a limited number of labeled training data which reduces the learning capability of ViTs; (2) ViTs have a high parameter count and are computation intensive which makes their deployment on resource-constrained SAR platforms difficult. In this work, we develop a lightweight ViT model that can be trained directly on small datasets without any pre-training by utilizing the Shifted Patch Tokenization (SPT) and Locality Self-Attention (LSA) modules. We directly train this model on SAR datasets which have limited training samples to evaluate its effectiveness for SAR ATR applications. We evaluate our proposed model, that we call VTR (ViT for SAR ATR), on three widely used SAR datasets: MSTAR, SynthWakeSAR, and GBSAR. Further, we propose a novel FPGA accelerator for VTR, in order to enable deployment for real-time SAR ATR applications.
Related papers
- RT-DETRv2: Improved Baseline with Bag-of-Freebies for Real-Time Detection Transformer [2.1186155813156926]
RT-DETRv2 builds upon the previous state-of-the-art real-time detector, RT-DETR.
To improve the flexibility, we suggest setting a distinct number of sampling points for features at different scales.
To enhance practicality, we propose an optional discrete sampling operator to replace the grid_sample operator.
arXiv Detail & Related papers (2024-07-24T10:20:19Z) - Towards SAR Automatic Target Recognition MultiCategory SAR Image Classification Based on Light Weight Vision Transformer [11.983317593939688]
This paper tries to apply a lightweight vision transformer based model to classify SAR images.
The entire structure was verified by an open-accessed SAR data set.
arXiv Detail & Related papers (2024-05-18T11:24:52Z) - Visual Delta Generator with Large Multi-modal Models for Semi-supervised Composed Image Retrieval [50.72924579220149]
Composed Image Retrieval (CIR) is a task that retrieves images similar to a query, based on a provided textual modification.
Current techniques rely on supervised learning for CIR models using labeled triplets of the reference image, text, target image.
We propose a new semi-supervised CIR approach where we search for a reference and its related target images in auxiliary data.
arXiv Detail & Related papers (2024-04-23T21:00:22Z) - Benchmarking Deep Learning Classifiers for SAR Automatic Target
Recognition [7.858656052565242]
This paper comprehensively benchmarks several advanced deep learning models for SAR ATR with multiple distinct SAR imagery datasets.
We evaluate and compare the five classifiers concerning their classification accuracy runtime performance in terms of inference throughput and analytical performance.
No clear model winner emerges from all of our chosen metrics and a one model rules all case is doubtful in the domain of SAR ATR.
arXiv Detail & Related papers (2023-12-12T02:20:39Z) - Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels [100.43280310123784]
We investigate the use of automatically-generated transcriptions of unlabelled datasets to increase the training set size.
We demonstrate that increasing the size of the training set, a recent trend in the literature, leads to reduced WER despite using noisy transcriptions.
The proposed model achieves new state-of-the-art performance on AV-ASR on LRS2 and LRS3.
arXiv Detail & Related papers (2023-03-25T00:37:34Z) - LaMAR: Benchmarking Localization and Mapping for Augmented Reality [80.23361950062302]
We introduce LaMAR, a new benchmark with a comprehensive capture and GT pipeline that co-registers realistic trajectories and sensor streams captured by heterogeneous AR devices.
We publish a benchmark dataset of diverse and large-scale scenes recorded with head-mounted and hand-held AR devices.
arXiv Detail & Related papers (2022-10-19T17:58:17Z) - Learning to Simulate Realistic LiDARs [66.7519667383175]
We introduce a pipeline for data-driven simulation of a realistic LiDAR sensor.
We show that our model can learn to encode realistic effects such as dropped points on transparent surfaces.
We use our technique to learn models of two distinct LiDAR sensors and use them to improve simulated LiDAR data accordingly.
arXiv Detail & Related papers (2022-09-22T13:12:54Z) - Self-Supervised RF Signal Representation Learning for NextG Signal
Classification with Deep Learning [5.624291722263331]
Self-supervised learning enables the learning of useful representations from Radio Frequency (RF) signals themselves.
We show that the sample efficiency (the number of labeled samples required to achieve a certain accuracy performance) of AMR can be significantly increased by learning signal representations with self-supervised learning.
arXiv Detail & Related papers (2022-07-07T02:07:03Z) - Dual-mode ASR: Unify and Improve Streaming ASR with Full-context
Modeling [76.43479696760996]
We propose a unified framework, Dual-mode ASR, to train a single end-to-end ASR model with shared weights for both streaming and full-context speech recognition.
We show that the latency and accuracy of streaming ASR significantly benefit from weight sharing and joint training of full-context ASR.
arXiv Detail & Related papers (2020-10-12T21:12:56Z) - Automatic Target Recognition on Synthetic Aperture Radar Imagery: A
Survey [0.0]
We propose a taxonomy for the SAR ATR architectures, along with a comparison of the strengths and weaknesses of each method under both standard and extended operational conditions.
Despite MSTAR being the standard SAR ATR benchmarking dataset we also highlight its weaknesses and suggest future research directions.
arXiv Detail & Related papers (2020-07-04T14:22:30Z) - Taurus: A Data Plane Architecture for Per-Packet ML [59.1343317736213]
We present the design and implementation of Taurus, a data plane for line-rate inference.
Our evaluation of a Taurus switch ASIC shows that Taurus operates orders of magnitude faster than a server-based control plane.
arXiv Detail & Related papers (2020-02-12T09:18:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.