RemoteNet: Remote Sensing Image Segmentation Network based on
Global-Local Information
- URL: http://arxiv.org/abs/2302.13084v2
- Date: Mon, 14 Aug 2023 13:18:34 GMT
- Title: RemoteNet: Remote Sensing Image Segmentation Network based on
Global-Local Information
- Authors: Satyawant Kumar, Abhishek Kumar, Dong-Gyu Lee
- Abstract summary: We propose a remote sensing image segmentation network, RemoteNet, for semantic segmentation of remote sensing images.
We capture the global and local features by leveraging the benefits of the transformer and convolution mechanisms.
- Score: 7.953644697658355
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Remotely captured images possess an immense scale and object appearance
variability due to the complex scene. It becomes challenging to capture the
underlying attributes in the global and local context for their segmentation.
Existing networks struggle to capture the inherent features due to the
cluttered background. To address these issues, we propose a remote sensing
image segmentation network, RemoteNet, for semantic segmentation of remote
sensing images. We capture the global and local features by leveraging the
benefits of the transformer and convolution mechanisms. RemoteNet is an
encoder-decoder design that uses multi-scale features. We construct an
attention map module to generate channel-wise attention scores for fusing these
features. We construct a global-local transformer block (GLTB) in the decoder
network to support learning robust representations during a decoding phase.
Further, we designed a feature refinement module to refine the fused output of
the shallow stage encoder feature and the deepest GLTB feature of the decoder.
Experimental findings on the two public datasets show the effectiveness of the
proposed RemoteNet.
Related papers
- LKASeg:Remote-Sensing Image Semantic Segmentation with Large Kernel Attention and Full-Scale Skip Connections [27.473573286685063]
We propose a remote-sensing image semantic segmentation network named LKASeg.
LKASeg combines Large Kernel Attention(LSKA) and Full-Scale Skip Connections(FSC)
On the ISPRS Vaihingen dataset, the mF1 and mIoU scores achieved 90.33% and 82.77%.
arXiv Detail & Related papers (2024-10-14T12:25:48Z) - Multi-view Aggregation Network for Dichotomous Image Segmentation [76.75904424539543]
Dichotomous Image (DIS) has recently emerged towards high-precision object segmentation from high-resolution natural images.
Existing methods rely on tedious multiple encoder-decoder streams and stages to gradually complete the global localization and local refinement.
Inspired by it, we model DIS as a multi-view object perception problem and provide a parsimonious multi-view aggregation network (MVANet)
Experiments on the popular DIS-5K dataset show that our MVANet significantly outperforms state-of-the-art methods in both accuracy and speed.
arXiv Detail & Related papers (2024-04-11T03:00:00Z) - TransY-Net:Learning Fully Transformer Networks for Change Detection of
Remote Sensing Images [64.63004710817239]
We propose a novel Transformer-based learning framework named TransY-Net for remote sensing image CD.
It improves the feature extraction from a global view and combines multi-level visual features in a pyramid manner.
Our proposed method achieves a new state-of-the-art performance on four optical and two SAR image CD benchmarks.
arXiv Detail & Related papers (2023-10-22T07:42:19Z) - Feature Aggregation Network for Building Extraction from High-resolution
Remote Sensing Images [1.7623838912231695]
High-resolution satellite remote sensing data acquisition has uncovered the potential for detailed extraction of surface architectural features.
Current methods focus exclusively on localized information of surface features.
We propose the Feature Aggregation Network (FANet), concentrating on extracting both global and local features.
arXiv Detail & Related papers (2023-09-12T07:31:51Z) - CM-GAN: Image Inpainting with Cascaded Modulation GAN and Object-Aware
Training [112.96224800952724]
We propose cascaded modulation GAN (CM-GAN) to generate plausible image structures when dealing with large holes in complex images.
In each decoder block, global modulation is first applied to perform coarse semantic-aware synthesis structure, then spatial modulation is applied on the output of global modulation to further adjust the feature map in a spatially adaptive fashion.
In addition, we design an object-aware training scheme to prevent the network from hallucinating new objects inside holes, fulfilling the needs of object removal tasks in real-world scenarios.
arXiv Detail & Related papers (2022-03-22T16:13:27Z) - Conformer: Local Features Coupling Global Representations for Visual
Recognition [72.9550481476101]
We propose a hybrid network structure, termed Conformer, to take advantage of convolutional operations and self-attention mechanisms for enhanced representation learning.
Experiments show that Conformer, under the comparable parameter complexity, outperforms the visual transformer (DeiT-B) by 2.3% on ImageNet.
arXiv Detail & Related papers (2021-05-09T10:00:03Z) - Transformer Meets DCFAM: A Novel Semantic Segmentation Scheme for
Fine-Resolution Remote Sensing Images [6.171417925832851]
We introduce the Swin Transformer as the backbone to fully extract the context information.
We also design a novel decoder named densely connected feature aggregation module (DCFAM) to restore the resolution and generate the segmentation map.
arXiv Detail & Related papers (2021-04-25T11:34:22Z) - Beyond Single Stage Encoder-Decoder Networks: Deep Decoders for Semantic
Image Segmentation [56.44853893149365]
Single encoder-decoder methodologies for semantic segmentation are reaching their peak in terms of segmentation quality and efficiency per number of layers.
We propose a new architecture based on a decoder which uses a set of shallow networks for capturing more information content.
In order to further improve the architecture we introduce a weight function which aims to re-balance classes to increase the attention of the networks to under-represented objects.
arXiv Detail & Related papers (2020-07-19T18:44:34Z) - Designing and Training of A Dual CNN for Image Denoising [117.54244339673316]
We propose a Dual denoising Network (DudeNet) to recover a clean image.
DudeNet consists of four modules: a feature extraction block, an enhancement block, a compression block, and a reconstruction block.
arXiv Detail & Related papers (2020-07-08T08:16:24Z) - Image fusion using symmetric skip autoencodervia an Adversarial
Regulariser [6.584748347223698]
We propose a residual autoencoder architecture, regularised by a residual adversarial network, to generate a more realistic fused image.
The residual module serves as primary building for the encoder, decoder and adversarial network.
We propose an adversarial regulariser network which would perform supervised learning on the fused image and the original visual image.
arXiv Detail & Related papers (2020-05-01T15:31:45Z) - Dual Convolutional LSTM Network for Referring Image Segmentation [18.181286443737417]
referring image segmentation is a problem at the intersection of computer vision and natural language understanding.
We propose a dual convolutional LSTM (ConvLSTM) network to tackle this problem.
arXiv Detail & Related papers (2020-01-30T20:40:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.