Sampling Equivariant Self-attention Networks for Object Detection in
Aerial Images
- URL: http://arxiv.org/abs/2111.03420v1
- Date: Fri, 5 Nov 2021 11:48:04 GMT
- Title: Sampling Equivariant Self-attention Networks for Object Detection in
Aerial Images
- Authors: Guo-Ye Yang, Xiang-Li Li, Ralph R. Martin, Shi-Min Hu
- Abstract summary: Objects in aerial images have greater variations in scale and orientation than in typical images, so detection is more difficult.
We propose sampling equivariant self-attention networks which consider self-attention restricted to a local image patch.
We also use a novel randomized normalization module to tackle overfitting due to limited aerial image data.
- Score: 36.9958603490323
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Objects in aerial images have greater variations in scale and orientation
than in typical images, so detection is more difficult. Convolutional neural
networks use a variety of frequency- and orientation-specific kernels to
identify objects subject to different transformations; these require many
parameters. Sampling equivariant networks can adjust sampling from input
feature maps according to the transformation of the object, allowing a kernel
to extract features of an object under different transformations. Doing so
requires fewer parameters, and makes the network more suitable for representing
deformable objects, like those in aerial images. However, methods like
deformable convolutional networks can only provide sampling equivariance under
certain circumstances, because of the locations used for sampling. We propose
sampling equivariant self-attention networks which consider self-attention
restricted to a local image patch as convolution sampling with masks instead of
locations, and design a transformation embedding module to further improve the
equivariant sampling ability. We also use a novel randomized normalization
module to tackle overfitting due to limited aerial image data. We show that our
model (i) provides significantly better sampling equivariance than existing
methods, without additional supervision, (ii) provides improved classification
on ImageNet, and (iii) achieves state-of-the-art results on the DOTA dataset,
without increased computation.
Related papers
- Cross-domain and Cross-dimension Learning for Image-to-Graph
Transformers [50.576354045312115]
Direct image-to-graph transformation is a challenging task that solves object detection and relationship prediction in a single model.
We introduce a set of methods enabling cross-domain and cross-dimension transfer learning for image-to-graph transformers.
We demonstrate our method's utility in cross-domain and cross-dimension experiments, where we pretrain our models on 2D satellite images before applying them to vastly different target domains in 2D and 3D.
arXiv Detail & Related papers (2024-03-11T10:48:56Z) - Affine-Consistent Transformer for Multi-Class Cell Nuclei Detection [76.11864242047074]
We propose a novel Affine-Consistent Transformer (AC-Former), which directly yields a sequence of nucleus positions.
We introduce an Adaptive Affine Transformer (AAT) module, which can automatically learn the key spatial transformations to warp original images for local network training.
Experimental results demonstrate that the proposed method significantly outperforms existing state-of-the-art algorithms on various benchmarks.
arXiv Detail & Related papers (2023-10-22T02:27:02Z) - Transformation-Invariant Network for Few-Shot Object Detection in Remote
Sensing Images [15.251042369061024]
Few-shot object detection (FSOD) relies on a large amount of labeled data for training.
Scale and orientation variations of objects in remote sensing images pose significant challenges to existing FSOD methods.
We propose integrating a feature pyramid network and utilizing prototype features to enhance query features.
arXiv Detail & Related papers (2023-03-13T02:21:38Z) - Improving the Sample-Complexity of Deep Classification Networks with
Invariant Integration [77.99182201815763]
Leveraging prior knowledge on intraclass variance due to transformations is a powerful method to improve the sample complexity of deep neural networks.
We propose a novel monomial selection algorithm based on pruning methods to allow an application to more complex problems.
We demonstrate the improved sample complexity on the Rotated-MNIST, SVHN and CIFAR-10 datasets.
arXiv Detail & Related papers (2022-02-08T16:16:11Z) - Quantised Transforming Auto-Encoders: Achieving Equivariance to
Arbitrary Transformations in Deep Networks [23.673155102696338]
Convolutional Neural Networks (CNNs) are equivariant to image translation.
We propose an auto-encoder architecture whose embedding obeys an arbitrary set of equivariance relations simultaneously.
We demonstrate results of successful re-rendering of transformed versions of input images on several datasets.
arXiv Detail & Related papers (2021-11-25T02:26:38Z) - Rotation Equivariant Feature Image Pyramid Network for Object Detection
in Optical Remote Sensing Imagery [39.25541709228373]
We propose the rotation equivariant feature image pyramid network (REFIPN), an image pyramid network based on rotation equivariance convolution.
The proposed pyramid network extracts features in a wide range of scales and orientations by using novel convolution filters.
The detection performance of the proposed model is validated on two commonly used aerial benchmarks.
arXiv Detail & Related papers (2021-06-02T01:33:49Z) - Truly shift-equivariant convolutional neural networks with adaptive
polyphase upsampling [28.153820129486025]
In image classification, adaptive polyphase downsampling (APS-D) was recently proposed to make CNNs perfectly shift invariant.
We propose adaptive polyphase upsampling (APS-U), a non-linear extension of conventional upsampling, which allows CNNs to exhibit perfect shift equivariance.
arXiv Detail & Related papers (2021-05-09T22:33:53Z) - A Hierarchical Transformation-Discriminating Generative Model for Few
Shot Anomaly Detection [93.38607559281601]
We devise a hierarchical generative model that captures the multi-scale patch distribution of each training image.
The anomaly score is obtained by aggregating the patch-based votes of the correct transformation across scales and image regions.
arXiv Detail & Related papers (2021-04-29T17:49:48Z) - Change Detection from SAR Images Based on Deformable Residual
Convolutional Neural Networks [26.684293663473415]
Convolutional neural networks (CNN) have made great progress for synthetic aperture radar (SAR) images change detection.
In this paper, a novel underlineDeformable underlineResidual Convolutional Neural underlineNetwork (DRNet) is designed for SAR images change detection.
arXiv Detail & Related papers (2021-04-06T05:52:25Z) - FDA: Fourier Domain Adaptation for Semantic Segmentation [82.4963423086097]
We describe a simple method for unsupervised domain adaptation, whereby the discrepancy between the source and target distributions is reduced by swapping the low-frequency spectrum of one with the other.
We illustrate the method in semantic segmentation, where densely annotated images are aplenty in one domain, but difficult to obtain in another.
Our results indicate that even simple procedures can discount nuisance variability in the data that more sophisticated methods struggle to learn away.
arXiv Detail & Related papers (2020-04-11T22:20:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.