Efficient Remote Sensing Segmentation With Generative Adversarial
Transformer
- URL: http://arxiv.org/abs/2310.01292v1
- Date: Mon, 2 Oct 2023 15:46:59 GMT
- Title: Efficient Remote Sensing Segmentation With Generative Adversarial
Transformer
- Authors: Luyi Qiu and Dayu Yu and Xiaofeng Zhang and Chenxiao Zhang
- Abstract summary: This paper proposes an efficient Generative Adversarial Transfomer (GATrans) for achieving high-precision semantic segmentation.
The framework utilizes a Global Transformer Network (GTNet) as the generator, efficiently extracting multi-level features.
We validate the effectiveness of our approach through extensive experiments on the Vaihingen dataset, achieving an average F1 score of 90.17% and an overall accuracy of 91.92%.
- Score: 5.728847418491545
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most deep learning methods that achieve high segmentation accuracy require
deep network architectures that are too heavy and complex to run on embedded
devices with limited storage and memory space. To address this issue, this
paper proposes an efficient Generative Adversarial Transfomer (GATrans) for
achieving high-precision semantic segmentation while maintaining an extremely
efficient size. The framework utilizes a Global Transformer Network (GTNet) as
the generator, efficiently extracting multi-level features through residual
connections. GTNet employs global transformer blocks with progressively linear
computational complexity to reassign global features based on a learnable
similarity function. To focus on object-level and pixel-level information, the
GATrans optimizes the objective function by combining structural similarity
losses. We validate the effectiveness of our approach through extensive
experiments on the Vaihingen dataset, achieving an average F1 score of 90.17%
and an overall accuracy of 91.92%.
Related papers
- CARE Transformer: Mobile-Friendly Linear Visual Transformer via Decoupled Dual Interaction [77.8576094863446]
We propose a new detextbfCoupled dutextbfAl-interactive lineatextbfR atttextbfEntion (CARE) mechanism.
We first propose an asymmetrical feature decoupling strategy that asymmetrically decouples the learning process for local inductive bias and long-range dependencies.
By adopting a decoupled learning way and fully exploiting complementarity across features, our method can achieve both high efficiency and accuracy.
arXiv Detail & Related papers (2024-11-25T07:56:13Z) - Efficient Semantic Segmentation via Lightweight Multiple-Information Interaction Network [37.84039482457571]
We propose a lightweight multiple-information interaction network for real-time semantic segmentation, called LMIINet.
It effectively combines CNNs and Transformers while reducing redundant computations and memory footprint.
With only 0.72M parameters and 11.74G FLOPs, LMIINet achieves 72.0% mIoU at 100 FPS on the Cityscapes test set and 69.94% mIoU at 160 FPS on the CamVid dataset.
arXiv Detail & Related papers (2024-10-03T05:45:24Z) - TransUKAN:Computing-Efficient Hybrid KAN-Transformer for Enhanced Medical Image Segmentation [5.280523424712006]
U-Net is currently the most widely used architecture for medical image segmentation.
We have improved the KAN to reduce memory usage and computational load.
This approach enhances the model's capability to capture nonlinear relationships.
arXiv Detail & Related papers (2024-09-23T02:52:49Z) - ELGC-Net: Efficient Local-Global Context Aggregation for Remote Sensing Change Detection [65.59969454655996]
We propose an efficient change detection framework, ELGC-Net, which leverages rich contextual information to precisely estimate change regions.
Our proposed ELGC-Net sets a new state-of-the-art performance in remote sensing change detection benchmarks.
We also introduce ELGC-Net-LW, a lighter variant with significantly reduced computational complexity, suitable for resource-constrained settings.
arXiv Detail & Related papers (2024-03-26T17:46:25Z) - Spatial-information Guided Adaptive Context-aware Network for Efficient
RGB-D Semantic Segmentation [9.198120596225968]
We propose an efficient lightweight encoder-decoder network that reduces the computational parameters and guarantees the robustness of the algorithm.
Experimental results on NYUv2, SUN RGB-D, and Cityscapes datasets show that our method achieves a better trade-off among segmentation accuracy, inference time, and parameters than the state-of-the-art methods.
arXiv Detail & Related papers (2023-08-11T09:02:03Z) - DLGSANet: Lightweight Dynamic Local and Global Self-Attention Networks
for Image Super-Resolution [83.47467223117361]
We propose an effective lightweight dynamic local and global self-attention network (DLGSANet) to solve image super-resolution.
Motivated by the network designs of Transformers, we develop a simple yet effective multi-head dynamic local self-attention (MHDLSA) module to extract local features efficiently.
To overcome this problem, we develop a sparse global self-attention (SparseGSA) module to select the most useful similarity values.
arXiv Detail & Related papers (2023-01-05T12:06:47Z) - RTFormer: Efficient Design for Real-Time Semantic Segmentation with
Transformer [63.25665813125223]
We propose RTFormer, an efficient dual-resolution transformer for real-time semantic segmenation.
It achieves better trade-off between performance and efficiency than CNN-based models.
Experiments on mainstream benchmarks demonstrate the effectiveness of our proposed RTFormer.
arXiv Detail & Related papers (2022-10-13T16:03:53Z) - MISSU: 3D Medical Image Segmentation via Self-distilling TransUNet [55.16833099336073]
We propose to self-distill a Transformer-based UNet for medical image segmentation.
It simultaneously learns global semantic information and local spatial-detailed features.
Our MISSU achieves the best performance over previous state-of-the-art methods.
arXiv Detail & Related papers (2022-06-02T07:38:53Z) - TransMOT: Spatial-Temporal Graph Transformer for Multiple Object
Tracking [74.82415271960315]
We propose a solution named TransMOT to efficiently model the spatial and temporal interactions among objects in a video.
TransMOT is not only more computationally efficient than the traditional Transformer, but it also achieves better tracking accuracy.
The proposed method is evaluated on multiple benchmark datasets including MOT15, MOT16, MOT17, and MOT20.
arXiv Detail & Related papers (2021-04-01T01:49:05Z) - TransFuse: Fusing Transformers and CNNs for Medical Image Segmentation [9.266588373318688]
We study the problem of improving efficiency in modeling global contexts without losing localization ability for low-level details.
TransFuse, a novel two-branch architecture is proposed, which combines Transformers and CNNs in a parallel style.
With TransFuse, both global dependency and low-level spatial details can be efficiently captured in a much shallower manner.
arXiv Detail & Related papers (2021-02-16T08:09:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.