Learning Calibrated-Guidance for Object Detection in Aerial Images
- URL: http://arxiv.org/abs/2103.11399v1
- Date: Sun, 21 Mar 2021 13:55:46 GMT
- Title: Learning Calibrated-Guidance for Object Detection in Aerial Images
- Authors: Dong Liang, Zongqi Wei, Dong Zhang, Qixiang Geng, Liyan Zhang, Han
Sun, Huiyu Zhou, Mingqiang Wei, Pan Gao
- Abstract summary: We propose a Calibrated-Guidance scheme to enhance channel communications in a feature transformer fashion.
Our CG can be plugged into any deep neural network, which is named as CG-Net.
- Score: 27.922626207443994
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, the study on object detection in aerial images has made tremendous
progress in the community of computer vision. However, most state-of-the-art
methods tend to develop elaborate attention mechanisms for the space-time
feature calibrations with high computational complexity, while surprisingly
ignoring the importance of feature calibrations in channels. In this work, we
propose a simple yet effective Calibrated-Guidance (CG) scheme to enhance
channel communications in a feature transformer fashion, which can adaptively
determine the calibration weights for each channel based on the global feature
affinity-pairs. Specifically, given a set of feature maps, CG first computes
the feature similarity between each channel and the remaining channels as the
intermediary calibration guidance. Then, re-representing each channel by
aggregating all the channels weighted together via the guidance. Our CG can be
plugged into any deep neural network, which is named as CG-Net. To demonstrate
its effectiveness and efficiency, extensive experiments are carried out on both
oriented and horizontal object detection tasks of aerial images. Results on two
challenging benchmarks (i.e., DOTA and HRSC2016) demonstrate that our CG-Net
can achieve state-of-the-art performance in accuracy with a fair computational
overhead. https://github.com/WeiZongqi/CG-Net
Related papers
- Global Context Aggregation Network for Lightweight Saliency Detection of
Surface Defects [70.48554424894728]
We develop a Global Context Aggregation Network (GCANet) for lightweight saliency detection of surface defects on the encoder-decoder structure.
First, we introduce a novel transformer encoder on the top layer of the lightweight backbone, which captures global context information through a novel Depth-wise Self-Attention (DSA) module.
The experimental results on three public defect datasets demonstrate that the proposed network achieves a better trade-off between accuracy and running efficiency compared with other 17 state-of-the-art methods.
arXiv Detail & Related papers (2023-09-22T06:19:11Z) - Joint Channel Estimation and Feedback with Masked Token Transformers in
Massive MIMO Systems [74.52117784544758]
This paper proposes an encoder-decoder based network that unveils the intrinsic frequency-domain correlation within the CSI matrix.
The entire encoder-decoder network is utilized for channel compression.
Our method outperforms state-of-the-art channel estimation and feedback techniques in joint tasks.
arXiv Detail & Related papers (2023-06-08T06:15:17Z) - Efficient Multi-Scale Attention Module with Cross-Spatial Learning [4.046170185945849]
A novel efficient multi-scale attention (EMA) module is proposed.
We focus on retaining the information on per channel and decreasing the computational overhead.
We conduct extensive ablation studies and experiments on image classification and object detection tasks.
arXiv Detail & Related papers (2023-05-23T00:35:47Z) - Towards Improving Workers' Safety and Progress Monitoring of
Construction Sites Through Construction Site Understanding [0.0]
We propose a lightweight Optimized Positioning (OP) module to improve channel relation based on global feature affinity association.
OP-Net is a general deep neural network module that can be plugged into any deep neural network.
A benchmark test using SODA demonstrated that our OP-Net was capable of achieving new state-of-the-art performance in accuracy.
arXiv Detail & Related papers (2022-10-27T20:33:46Z) - Group Fisher Pruning for Practical Network Compression [58.25776612812883]
We present a general channel pruning approach that can be applied to various complicated structures.
We derive a unified metric based on Fisher information to evaluate the importance of a single channel and coupled channels.
Our method can be used to prune any structures including those with coupled channels.
arXiv Detail & Related papers (2021-08-02T08:21:44Z) - CT-Net: Channel Tensorization Network for Video Classification [48.4482794950675]
3D convolution is powerful for video classification but often computationally expensive.
Most approaches fail to achieve a preferable balance between convolutional efficiency and feature-interaction sufficiency.
We propose a concise and novel Channelization Network (CT-Net)
Our CT-Net outperforms a number of recent SOTA approaches, in terms of accuracy and/or efficiency.
arXiv Detail & Related papers (2021-06-03T05:35:43Z) - Urban Change Detection by Fully Convolutional Siamese Concatenate
Network with Attention [0.6999740786886537]
Change detection (CD) is an important problem in remote sensing, especially in disaster time for urban management.
Object-based models are preferred to pixel-based methods for handling very high-resolution remote sensing (VHR RS) images.
In this paper, a fully automatic change-detection algorithm on VHR RS images is proposed that deploys Fully Convolutional Siamese Concatenate networks.
arXiv Detail & Related papers (2021-01-31T17:47:16Z) - Channelized Axial Attention for Semantic Segmentation [70.14921019774793]
We propose the Channelized Axial Attention (CAA) to seamlessly integratechannel attention and axial attention with reduced computationalcomplexity.
Our CAA not onlyrequires much less computation resources compared with otherdual attention models such as DANet, but also outperforms the state-of-the-art ResNet-101-based segmentation models on alltested datasets.
arXiv Detail & Related papers (2021-01-19T03:08:03Z) - Anchor-free Small-scale Multispectral Pedestrian Detection [88.7497134369344]
We propose a method for effective and efficient multispectral fusion of the two modalities in an adapted single-stage anchor-free base architecture.
We aim at learning pedestrian representations based on object center and scale rather than direct bounding box predictions.
Results show our method's effectiveness in detecting small-scaled pedestrians.
arXiv Detail & Related papers (2020-08-19T13:13:01Z) - Approximating the Hotelling Observer with Autoencoder-Learned Efficient
Channels for Binary Signal Detection Tasks [12.521662223741671]
The objective assessment of image quality (IQ) has been advocated for the analysis and optimization of medical imaging systems.
A novel method for learning channels using an autoencoder (AE) is presented.
AEs are a type of artificial neural network (ANN) that are frequently employed to learn concise representations of data to reduce dimensionality.
arXiv Detail & Related papers (2020-03-04T20:24:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.