Related papers: Enhancing DeepLabV3+ to Fuse Aerial and Satellite Images for Semantic Segmentation

Enhancing DeepLabV3+ to Fuse Aerial and Satellite Images for Semantic Segmentation

URL: http://arxiv.org/abs/2503.22909v1
Date: Fri, 28 Mar 2025 23:07:39 GMT
Title: Enhancing DeepLabV3+ to Fuse Aerial and Satellite Images for Semantic Segmentation
Authors: Anas Berka, Mohamed El Hajji, Raphael Canals, Youssef Es-saady, Adel Hafiane,
Abstract summary: We introduce a new transposed conventional layers block for upsampling a second entry to fuse it with high level features.<n>This block is designed to amplify and integrate information from satellite images, thereby enriching the segmentation process.<n>For experiments, we used the LandCover.ai dataset for aerial images, alongside the corresponding dataset sourced from Sentinel 2 data.
Score: 3.508894670581109
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Aerial and satellite imagery are inherently complementary remote sensing sources, offering high-resolution detail alongside expansive spatial coverage. However, the use of these sources for land cover segmentation introduces several challenges, prompting the development of a variety of segmentation methods. Among these approaches, the DeepLabV3+ architecture is considered as a promising approach in the field of single-source image segmentation. However, despite its reliable results for segmentation, there is still a need to increase its robustness and improve its performance. This is particularly crucial for multimodal image segmentation, where the fusion of diverse types of information is essential. An interesting approach involves enhancing this architectural framework through the integration of novel components and the modification of certain internal processes. In this paper, we enhance the DeepLabV3+ architecture by introducing a new transposed conventional layers block for upsampling a second entry to fuse it with high level features. This block is designed to amplify and integrate information from satellite images, thereby enriching the segmentation process through fusion with aerial images. For experiments, we used the LandCover.ai (Land Cover from Aerial Imagery) dataset for aerial images, alongside the corresponding dataset sourced from Sentinel 2 data. Through the fusion of both sources, the mean Intersection over Union (mIoU) achieved a total mIoU of 84.91% without data augmentation.

Related papers

AerOSeg: Harnessing SAM for Open-Vocabulary Segmentation in Remote Sensing Images [21.294581646546124]
AerOSeg is a novel Open-Vocabulary (OVS) approach for remote sensing data. We compute robust image-text correlation features using rotated versions of the input image and domain-specific prompts. Inspired by the success of the Segment Anything Model (SAM) in diverse domains, we leverage SAM features to guide the spatial refinement of correlation features. We enhance the refined correlation features using a multi-scale attention-aware composition to produce the final segmentation map.
arXiv Detail & Related papers (2025-04-12T13:06:46Z)
Remote Sensing Image Segmentation Using Vision Mamba and Multi-Scale Multi-Frequency Feature Fusion [9.098711843118629]
This paper introduces state space model (SSM) and proposes a novel hybrid semantic segmentation network based on vision Mamba (CVMH-UNet) This method designs a cross-scanning visual state space block (CVSSBlock) that uses cross 2D scanning (CS2D) to fully capture global information from multiple directions. By incorporating convolutional neural network branches to overcome the constraints of Vision Mamba (VMamba) in acquiring local information, this approach facilitates a comprehensive analysis of both global and local features.
arXiv Detail & Related papers (2024-10-08T02:17:38Z)
Deep Multimodal Fusion for Semantic Segmentation of Remote Sensing Earth Observation Data [0.08192907805418582]
This paper proposes a late fusion deep learning model (LF-DLM) for semantic segmentation. One branch integrates detailed textures from aerial imagery captured by UNetFormer with a Multi-Axis Vision Transformer (ViT) backbone. The other branch captures complex-temporal dynamics from the Sentinel-2 satellite imageMax time series using a U-ViNet with Temporal Attention (U-TAE)
arXiv Detail & Related papers (2024-10-01T07:50:37Z)
AMBER -- Advanced SegFormer for Multi-Band Image Segmentation: an application to Hyperspectral Imaging [0.0]
This paper introduces AMBER, an advanced SegFormer specifically designed for multi-band image segmentation. AMBER enhances the original SegFormer by incorporating three-dimensional convolutions, custom kernel sizes, and a Funnelizer layer. Our experiments, conducted on three benchmark datasets and on a dataset from the PRISMA satellite, show that AMBER outperforms traditional CNN-based methods in terms of Overall Accuracy, Kappa coefficient, and Average Accuracy.
arXiv Detail & Related papers (2024-09-14T09:34:05Z)
A Semantic-Aware and Multi-Guided Network for Infrared-Visible Image Fusion [41.34335755315773]
Multi-modality image fusion aims at fusing specific-modality and shared-modality information from two source images. We propose a three-branch encoder-decoder architecture along with corresponding fusion layers as the fusion strategy. Our method has obtained competitive results compared with state-of-the-art methods in visible/infrared image fusion and medical image fusion tasks.
arXiv Detail & Related papers (2024-06-11T09:32:40Z)
Feature Aggregation Network for Building Extraction from High-resolution Remote Sensing Images [1.7623838912231695]
High-resolution satellite remote sensing data acquisition has uncovered the potential for detailed extraction of surface architectural features. Current methods focus exclusively on localized information of surface features. We propose the Feature Aggregation Network (FANet), concentrating on extracting both global and local features.
arXiv Detail & Related papers (2023-09-12T07:31:51Z)
Mutual Information-driven Triple Interaction Network for Efficient Image Dehazing [54.168567276280505]
We propose a novel Mutual Information-driven Triple interaction Network (MITNet) for image dehazing. The first stage, named amplitude-guided haze removal, aims to recover the amplitude spectrum of the hazy images for haze removal. The second stage, named phase-guided structure refined, devotes to learning the transformation and refinement of the phase spectrum.
arXiv Detail & Related papers (2023-08-14T08:23:58Z)
Object Detection in Hyperspectral Image via Unified Spectral-Spatial Feature Aggregation [55.9217962930169]
We present S2ADet, an object detector that harnesses the rich spectral and spatial complementary information inherent in hyperspectral images. S2ADet surpasses existing state-of-the-art methods, achieving robust and reliable results.
arXiv Detail & Related papers (2023-06-14T09:01:50Z)
Searching a Compact Architecture for Robust Multi-Exposure Image Fusion [55.37210629454589]
Two major stumbling blocks hinder the development, including pixel misalignment and inefficient inference. This study introduces an architecture search-based paradigm incorporating self-alignment and detail repletion modules for robust multi-exposure image fusion. The proposed method outperforms various competitive schemes, achieving a noteworthy 3.19% improvement in PSNR for general scenarios and an impressive 23.5% enhancement in misaligned scenarios.
arXiv Detail & Related papers (2023-05-20T17:01:52Z)
Boosting Robustness of Image Matting with Context Assembling and Strong Data Augmentation [83.31087402305306]
robustness to trimaps and generalization to images from different domains is still under-explored. We propose an image matting method which achieves higher robustness (RMat) via multilevel context assembling and strong data augmentation targeting matting.
arXiv Detail & Related papers (2022-01-18T11:45:17Z)
Crosslink-Net: Double-branch Encoder Segmentation Network via Fusing Vertical and Horizontal Convolutions [58.71117402626524]
We present a novel double-branch encoder architecture for medical image segmentation. Our architecture is inspired by two observations: 1) Since the discrimination of features learned via square convolutional kernels needs to be further improved, we propose to utilize non-square vertical and horizontal convolutional kernels. The experiments validate the effectiveness of our model on four datasets.
arXiv Detail & Related papers (2021-07-24T02:58:32Z)
Transformer Meets Convolution: A Bilateral Awareness Net-work for Semantic Segmentation of Very Fine Resolution Ur-ban Scene Images [6.460167724233707]
We propose a bilateral awareness network (BANet) which contains a dependency path and a texture path. BANet captures the long-range relationships and fine-grained details in VFR images. Experiments conducted on the three large-scale urban scene image segmentation datasets, i.e., ISPRS Vaihingen dataset, ISPRS Potsdam dataset, and UAVid dataset, demonstrate the effective-ness of BANet.
arXiv Detail & Related papers (2021-06-23T13:57:36Z)
Deep Burst Super-Resolution [165.90445859851448]
We propose a novel architecture for the burst super-resolution task. Our network takes multiple noisy RAW images as input, and generates a denoised, super-resolved RGB image as output. In order to enable training and evaluation on real-world data, we additionally introduce the BurstSR dataset.
arXiv Detail & Related papers (2021-01-26T18:57:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.