A Multi-Stage Duplex Fusion ConvNet for Aerial Scene Classification
- URL: http://arxiv.org/abs/2203.16325v1
- Date: Tue, 29 Mar 2022 09:27:53 GMT
- Title: A Multi-Stage Duplex Fusion ConvNet for Aerial Scene Classification
- Authors: Jingjun Yi and Beichen Zhou
- Abstract summary: We develop ConvNet named multi-stage duplex fusion network (MSDF-Net)
MSDF-Net consists of multi-stage structures with DFblock.
Experiments are conducted on three widely-used aerial scene classification benchmarks.
- Score: 4.061135251278187
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing deep learning based methods effectively prompt the performance of
aerial scene classification. However, due to the large amount of parameters and
computational cost, it is rather difficult to apply these methods to multiple
real-time remote sensing applications such as on-board data preception on
drones and satellites. In this paper, we address this task by developing a
light-weight ConvNet named multi-stage duplex fusion network (MSDF-Net). The
key idea is to use parameters as little as possible while obtaining as strong
as possible scene representation capability. To this end, a residual-dense
duplex fusion strategy is developed to enhance the feature propagation while
re-using parameters as much as possible, and is realized by our duplex fusion
block (DFblock). Specifically, our MSDF-Net consists of multi-stage structures
with DFblock. Moreover, duplex semantic aggregation (DSA) module is developed
to mine the remote sensing scene information from extracted convolutional
features, which also contains two parallel branches for semantic description.
Extensive experiments are conducted on three widely-used aerial scene
classification benchmarks, and reflect that our MSDF-Net can achieve a
competitive performance against the recent state-of-art while reducing up to
80% parameter numbers. Particularly, an accuracy of 92.96% is achieved on AID
with only 0.49M parameters.
Related papers
- EPAM-Net: An Efficient Pose-driven Attention-guided Multimodal Network for Video Action Recognition [0.0]
We present an efficient pose-driven attention-guided multimodal action recognition (EPAM-Net) for action recognition in videos.
Specifically, we adapted X3D networks for both pose streams and network-temporal features from RGB videos and their skeleton sequences.
Our model provides a 6.2-9.9-x reduction in FLOPs (floating-point operation, in number of multiply-adds) and a 9--9.6x reduction in the number of network parameters.
arXiv Detail & Related papers (2024-08-10T03:15:24Z) - LMFNet: An Efficient Multimodal Fusion Approach for Semantic Segmentation in High-Resolution Remote Sensing [25.016421338677816]
Current methods often process only two types of data, missing out on the rich information that additional modalities can provide.
We propose a novel textbfLightweight textbfMultimodal data textbfFusion textbfNetwork (LMFNet)
LMFNet accommodates various data types simultaneously, including RGB, NirRG, and DSM, through a weight-sharing, multi-branch vision transformer.
arXiv Detail & Related papers (2024-04-21T13:29:42Z) - Deep Axial Hypercomplex Networks [1.370633147306388]
Recent works make it possible to improve representational capabilities by using hypercomplex-inspired networks.
This paper reduces this cost by factorizing a quaternion 2D convolutional module into two consecutive vectormap 1D convolutional modules.
Incorporating both yields our proposed hypercomplex network, a novel architecture that can be assembled to construct deep axial-hypercomplex networks.
arXiv Detail & Related papers (2023-01-11T18:31:00Z) - Lightweight Salient Object Detection in Optical Remote-Sensing Images
via Semantic Matching and Edge Alignment [61.45639694373033]
We propose a novel lightweight network for optical remote sensing images (ORSI-SOD) based on semantic matching and edge alignment, termed SeaNet.
Specifically, SeaNet includes a lightweight MobileNet-V2 for feature extraction, a dynamic semantic matching module (DSMM) for high-level features, and a portable decoder for inference.
arXiv Detail & Related papers (2023-01-07T04:33:51Z) - PSNet: Parallel Symmetric Network for Video Salient Object Detection [85.94443548452729]
We propose a VSOD network with up and down parallel symmetry, named PSNet.
Two parallel branches with different dominant modalities are set to achieve complete video saliency decoding.
arXiv Detail & Related papers (2022-10-12T04:11:48Z) - SFNet: Faster and Accurate Semantic Segmentation via Semantic Flow [88.97790684009979]
A common practice to improve the performance is to attain high-resolution feature maps with strong semantic representation.
We propose a Flow Alignment Module (FAM) to learn textitSemantic Flow between feature maps of adjacent levels.
We also present a novel Gated Dual Flow Alignment Module to directly align high-resolution feature maps and low-resolution feature maps.
arXiv Detail & Related papers (2022-07-10T08:25:47Z) - MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection [37.25262046781015]
Action detection is an essential and challenging task, especially for densely labelled datasets of untrimmed videos.
We propose a novel ConvTransformer network for action detection that efficiently captures both short-term and long-term temporal information.
Our network outperforms the state-of-the-art methods on all three datasets.
arXiv Detail & Related papers (2021-12-07T18:57:37Z) - EPMF: Efficient Perception-aware Multi-sensor Fusion for 3D Semantic Segmentation [62.210091681352914]
We study multi-sensor fusion for 3D semantic segmentation for many applications, such as autonomous driving and robotics.
In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF)
We propose a two-stream network to extract features from the two modalities separately. The extracted features are fused by effective residual-based fusion modules.
arXiv Detail & Related papers (2021-06-21T10:47:26Z) - Dense Hybrid Recurrent Multi-view Stereo Net with Dynamic Consistency
Checking [54.58791377183574]
Our novel hybrid recurrent multi-view stereo net consists of two core modules: 1) a light DRENet (Dense Reception Expanded) module to extract dense feature maps of original size with multi-scale context information, 2) a HU-LSTM (Hybrid U-LSTM) to regularize 3D matching volume into predicted depth map.
Our method exhibits competitive performance to the state-of-the-art method while dramatically reduces memory consumption, which costs only $19.4%$ of R-MVSNet memory consumption.
arXiv Detail & Related papers (2020-07-21T14:59:59Z) - Searching Central Difference Convolutional Networks for Face
Anti-Spoofing [68.77468465774267]
Face anti-spoofing (FAS) plays a vital role in face recognition systems.
Most state-of-the-art FAS methods rely on stacked convolutions and expert-designed network.
Here we propose a novel frame level FAS method based on Central Difference Convolution (CDC)
arXiv Detail & Related papers (2020-03-09T12:48:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.