MTU-Net: Multi-level TransUNet for Space-based Infrared Tiny Ship
Detection
- URL: http://arxiv.org/abs/2209.13756v1
- Date: Wed, 28 Sep 2022 00:48:14 GMT
- Title: MTU-Net: Multi-level TransUNet for Space-based Infrared Tiny Ship
Detection
- Authors: Tianhao Wu, Boyang Li, Yihang Luo, Yingqian Wang, Chao Xiao, Ting Liu,
Jungang Yang, Wei An, Yulan Guo
- Abstract summary: We develop a space-based infrared tiny ship detection dataset (namely, NUDT-SIRST-Sea) with 48 space-based infrared images and 17598 pixel-level tiny ship annotations.
Considering the extreme characteristics of those tiny ships in such challenging scenes, we propose a multi-level TransUNet (MTU-Net) in this paper.
Experimental results on the NUDT-SIRST-Sea dataset show that our MTU-Net outperforms traditional and existing deep learning based SIRST methods in terms of probability of detection, false alarm rate and intersection over union.
- Score: 42.92798053154314
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Space-based infrared tiny ship detection aims at separating tiny ships from
the images captured by earth orbiting satellites. Due to the extremely large
image coverage area (e.g., thousands square kilometers), candidate targets in
these images are much smaller, dimer, more changeable than those targets
observed by aerial-based and land-based imaging devices. Existing short imaging
distance-based infrared datasets and target detection methods cannot be well
adopted to the space-based surveillance task. To address these problems, we
develop a space-based infrared tiny ship detection dataset (namely,
NUDT-SIRST-Sea) with 48 space-based infrared images and 17598 pixel-level tiny
ship annotations. Each image covers about 10000 square kilometers of area with
10000X10000 pixels. Considering the extreme characteristics (e.g., small, dim,
changeable) of those tiny ships in such challenging scenes, we propose a
multi-level TransUNet (MTU-Net) in this paper. Specifically, we design a Vision
Transformer (ViT) Convolutional Neural Network (CNN) hybrid encoder to extract
multi-level features. Local feature maps are first extracted by several
convolution layers and then fed into the multi-level feature extraction module
(MVTM) to capture long-distance dependency. We further propose a
copy-rotate-resize-paste (CRRP) data augmentation approach to accelerate the
training phase, which effectively alleviates the issue of sample imbalance
between targets and background. Besides, we design a FocalIoU loss to achieve
both target localization and shape description. Experimental results on the
NUDT-SIRST-Sea dataset show that our MTU-Net outperforms traditional and
existing deep learning based SIRST methods in terms of probability of
detection, false alarm rate and intersection over union.
Related papers
- IRSAM: Advancing Segment Anything Model for Infrared Small Target Detection [55.554484379021524]
Infrared Small Target Detection (IRSTD) task falls short in achieving satisfying performance due to a notable domain gap between natural and infrared images.
We propose the IRSAM model for IRSTD, which improves SAM's encoder-decoder architecture to learn better feature representation of infrared small objects.
arXiv Detail & Related papers (2024-07-10T10:17:57Z) - Multimodal Transformer Using Cross-Channel attention for Object Detection in Remote Sensing Images [1.662438436885552]
Multi-modal fusion has been determined to enhance the accuracy by fusing data from multiple modalities.
We propose a novel multi-modal fusion strategy for mapping relationships between different channels at the early stage.
By addressing fusion in the early stage, as opposed to mid or late-stage methods, our method achieves competitive and even superior performance compared to existing techniques.
arXiv Detail & Related papers (2023-10-21T00:56:11Z) - Fast Fourier Convolution Based Remote Sensor Image Object Detection for
Earth Observation [0.0]
We propose a Frequency-aware Feature Pyramid Framework (FFPF) for remote sensing object detection.
F-ResNet is proposed to perceive the spectral context information by plugging the frequency domain convolution into each stage of the backbone.
The BSFPN is designed to use a bilateral sampling strategy and skipping connection to better model the association of object features at different scales.
arXiv Detail & Related papers (2022-09-01T15:50:58Z) - Beyond Cross-view Image Retrieval: Highly Accurate Vehicle Localization
Using Satellite Image [91.29546868637911]
This paper addresses the problem of vehicle-mounted camera localization by matching a ground-level image with an overhead-view satellite map.
The key idea is to formulate the task as pose estimation and solve it by neural-net based optimization.
Experiments on standard autonomous vehicle localization datasets have confirmed the superiority of the proposed method.
arXiv Detail & Related papers (2022-04-10T19:16:58Z) - Infrared Small-Dim Target Detection with Transformer under Complex
Backgrounds [155.388487263872]
We propose a new infrared small-dim target detection method with the transformer.
We adopt the self-attention mechanism of the transformer to learn the interaction information of image features in a larger range.
We also design a feature enhancement module to learn more features of small-dim targets.
arXiv Detail & Related papers (2021-09-29T12:23:41Z) - FOVEA: Foveated Image Magnification for Autonomous Navigation [53.69803081925454]
We propose an attentional approach that elastically magnifies certain regions while maintaining a small input canvas.
Our proposed method boosts the detection AP over standard Faster R-CNN, with and without finetuning.
On the autonomous driving datasets Argoverse-HD and BDD100K, we show our proposed method boosts the detection AP over standard Faster R-CNN, with and without finetuning.
arXiv Detail & Related papers (2021-08-27T03:07:55Z) - Locality-Aware Rotated Ship Detection in High-Resolution Remote Sensing
Imagery Based on Multi-Scale Convolutional Network [7.984128966509492]
We propose a locality-aware rotated ship detection (LARSD) framework based on a multi-scale convolutional neural network (CNN)
The proposed framework applies a UNet-like multi-scale CNN to generate multi-scale feature maps with high-level information in high resolution.
To enlarge the detection dataset, we build a new high-resolution ship detection (HRSD) dataset, where 2499 images and 9269 instances were collected from Google Earth with different resolutions.
arXiv Detail & Related papers (2020-07-24T03:01:42Z) - Drone-based RGB-Infrared Cross-Modality Vehicle Detection via
Uncertainty-Aware Learning [59.19469551774703]
Drone-based vehicle detection aims at finding the vehicle locations and categories in an aerial image.
We construct a large-scale drone-based RGB-Infrared vehicle detection dataset, termed DroneVehicle.
Our DroneVehicle collects 28, 439 RGB-Infrared image pairs, covering urban roads, residential areas, parking lots, and other scenarios from day to night.
arXiv Detail & Related papers (2020-03-05T05:29:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.