Related papers: MSCloudCAM: Cross-Attention with Multi-Scale Context for Multispectral Cloud Segmentation

MSCloudCAM: Cross-Attention with Multi-Scale Context for Multispectral Cloud Segmentation

URL: http://arxiv.org/abs/2510.10802v2
Date: Thu, 16 Oct 2025 21:22:55 GMT
Title: MSCloudCAM: Cross-Attention with Multi-Scale Context for Multispectral Cloud Segmentation
Authors: Md Abdullah Al Mazid, Liangdong Deng, Naphtali Rishe,
Abstract summary: MSCloudCAM is a Cross-Attention with Multi-Scale Context Network tailored for multispectral and multi-sensor cloud segmentation.<n>Our framework exploits the spectral richness of Sentinel-2 and Landsat-8 data to classify four semantic categories: clear sky, thin cloud, thick cloud, and cloud shadow.<n>MSCloudCAM combines a Swin Transformer backbone for hierarchical feature extraction with multi-scale context modules ASPP and PSP for enhanced scale-aware learning.
Score: 0.15293427903448018
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Clouds remain a critical challenge in optical satellite imagery, hindering reliable analysis for environmental monitoring, land cover mapping, and climate research. To overcome this, we propose MSCloudCAM, a Cross-Attention with Multi-Scale Context Network tailored for multispectral and multi-sensor cloud segmentation. Our framework exploits the spectral richness of Sentinel-2 (CloudSEN12) and Landsat-8 (L8Biome) data to classify four semantic categories: clear sky, thin cloud, thick cloud, and cloud shadow. MSCloudCAM combines a Swin Transformer backbone for hierarchical feature extraction with multi-scale context modules ASPP and PSP for enhanced scale-aware learning. A Cross-Attention block enables effective multisensor and multispectral feature fusion, while the integration of an Efficient Channel Attention Block (ECAB) and a Spatial Attention Module adaptively refine feature representations. Comprehensive experiments on CloudSEN12 and L8Biome demonstrate that MSCloudCAM delivers state-of-the-art segmentation accuracy, surpassing leading baseline architectures while maintaining competitive parameter efficiency and FLOPs. These results underscore the model's effectiveness and practicality, making it well-suited for large-scale Earth observation tasks and real-world applications.

Related papers

Weakly Supervised Cloud Detection Combining Spectral Features and Multi-Scale Deep Network [12.520904004953344]
We propose a weakly supervised cloud detection method that combines spectral features and multi-scale scene-level deep network (SpecMCD) to obtain highly accurate pixel-level cloud masks.<n>The F1-score of the proposed SpecMCD method shows an improvement of over 7.82%, highlighting the superiority and potential of the SpecMCD method for cloud detection.
arXiv Detail & Related papers (2025-10-01T08:32:49Z)
Deep Learning for Clouds and Cloud Shadow Segmentation in Methane Satellite and Airborne Imaging Spectroscopy [8.008730702184803]
We use machine learning methods to address the cloud and cloud shadow detection problem for sensors with high resolutions instruments.<n>We deploy and evaluate conventional techniques with advanced deep learning architectures, namely UNet and a Spectral Channel Attention Network (SCAN) method.<n>Our results show that conventional methods struggle with spatial and boundary definition, affecting the detection of clouds and cloud shadows.
arXiv Detail & Related papers (2025-09-24T00:49:52Z)
SGMAGNet: A Baseline Model for 3D Cloud Phase Structure Reconstruction on a New Passive Active Satellite Benchmark [17.3424418972935]
We present a benchmark dataset for transforming satellite observations into detailed 3D cloud phase structures.<n>We adopt SGMAGNet as the main model and compare it with several baseline architectures.<n>The results demonstrate that SGMAGNet achieves superior performance in cloud phase reconstruction.
arXiv Detail & Related papers (2025-09-19T07:29:23Z)
GCRPNet: Graph-Enhanced Contextual and Regional Perception Network for Salient Object Detection in Optical Remote Sensing Images [68.33481681452675]
We propose a graph-enhanced contextual and regional perception network (GCRPNet)<n>It builds upon the Mamba architecture to simultaneously capture long-range dependencies and enhance regional feature representation.<n>It performs adaptive patch scanning on feature maps processed via multi-scale convolutions, thereby capturing rich local region information.
arXiv Detail & Related papers (2025-08-14T11:31:43Z)
Semi-supervised Semantic Segmentation for Remote Sensing Images via Multi-scale Uncertainty Consistency and Cross-Teacher-Student Attention [59.19580789952102]
This paper proposes a novel semi-supervised Multi-Scale Uncertainty and Cross-Teacher-Student Attention (MUCA) model for RS image semantic segmentation tasks.<n>MUCA constrains the consistency among feature maps at different layers of the network by introducing a multi-scale uncertainty consistency regularization.<n>MUCA utilizes a Cross-Teacher-Student attention mechanism to guide the student network, guiding the student network to construct more discriminative feature representations.
arXiv Detail & Related papers (2025-01-18T11:57:20Z)
PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection [59.355022416218624]
integration of point and voxel representations is becoming more common in LiDAR-based 3D object detection. We propose a novel two-stage 3D object detector, called Point-Voxel Attention Fusion Network (PVAFN) PVAFN uses a multi-pooling strategy to integrate both multi-scale and region-specific information effectively.
arXiv Detail & Related papers (2024-08-26T19:43:01Z)
Multi-view Aggregation Network for Dichotomous Image Segmentation [76.75904424539543]
Dichotomous Image (DIS) has recently emerged towards high-precision object segmentation from high-resolution natural images. Existing methods rely on tedious multiple encoder-decoder streams and stages to gradually complete the global localization and local refinement. Inspired by it, we model DIS as a multi-view object perception problem and provide a parsimonious multi-view aggregation network (MVANet) Experiments on the popular DIS-5K dataset show that our MVANet significantly outperforms state-of-the-art methods in both accuracy and speed.
arXiv Detail & Related papers (2024-04-11T03:00:00Z)
MambaAD: Exploring State Space Models for Multi-class Unsupervised Anomaly Detection [53.03687787922032]
Mamba-based models with superior long-range modeling and linear efficiency have garnered substantial attention.<n>This study pioneers the application of Mamba to multi-class unsupervised anomaly detection, presenting MambaAD.<n>The proposed LSS module, integrating parallel cascaded (Hybrid State Space) HSS blocks and multi- kernel convolutions operations, effectively captures both long-range and local information.
arXiv Detail & Related papers (2024-04-09T18:28:55Z)
Distribution-aware Interactive Attention Network and Large-scale Cloud Recognition Benchmark on FY-4A Satellite Image [24.09239785062109]
We develop a novel dataset for accurate cloud recognition. We use domain adaptation methods to align 70,419 image-label pairs in terms of projection, temporal resolution, and spatial resolution. We also introduce a Distribution-aware Interactive-Attention Network (DIAnet), which preserves pixel-level details through a high-resolution branch and a parallel cross-branch.
arXiv Detail & Related papers (2024-01-06T09:58:09Z)
CLiSA: A Hierarchical Hybrid Transformer Model using Orthogonal Cross Attention for Satellite Image Cloud Segmentation [5.178465447325005]
Deep learning algorithms have emerged as promising approach to solve image segmentation problems. In this paper, we introduce a deep-learning model for effective cloud mask generation named CLiSA - Cloud segmentation via Lipschitz Stable Attention network. We demonstrate both qualitative and quantitative outcomes for multiple satellite image datasets including Landsat-8, Sentinel-2, and Cartosat-2s.
arXiv Detail & Related papers (2023-11-29T09:31:31Z)
BIMS-PU: Bi-Directional and Multi-Scale Point Cloud Upsampling [60.257912103351394]
We develop a new point cloud upsampling pipeline called BIMS-PU. We decompose the up/downsampling procedure into several up/downsampling sub-steps by breaking the target sampling factor into smaller factors. We show that our method achieves superior results to state-of-the-art approaches.
arXiv Detail & Related papers (2022-06-25T13:13:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.