MSCloudCAM: Cross-Attention with Multi-Scale Context for Multispectral Cloud Segmentation
- URL: http://arxiv.org/abs/2510.10802v2
- Date: Thu, 16 Oct 2025 21:22:55 GMT
- Title: MSCloudCAM: Cross-Attention with Multi-Scale Context for Multispectral Cloud Segmentation
- Authors: Md Abdullah Al Mazid, Liangdong Deng, Naphtali Rishe,
- Abstract summary: MSCloudCAM is a Cross-Attention with Multi-Scale Context Network tailored for multispectral and multi-sensor cloud segmentation.<n>Our framework exploits the spectral richness of Sentinel-2 and Landsat-8 data to classify four semantic categories: clear sky, thin cloud, thick cloud, and cloud shadow.<n>MSCloudCAM combines a Swin Transformer backbone for hierarchical feature extraction with multi-scale context modules ASPP and PSP for enhanced scale-aware learning.
- Score: 0.15293427903448018
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Clouds remain a critical challenge in optical satellite imagery, hindering reliable analysis for environmental monitoring, land cover mapping, and climate research. To overcome this, we propose MSCloudCAM, a Cross-Attention with Multi-Scale Context Network tailored for multispectral and multi-sensor cloud segmentation. Our framework exploits the spectral richness of Sentinel-2 (CloudSEN12) and Landsat-8 (L8Biome) data to classify four semantic categories: clear sky, thin cloud, thick cloud, and cloud shadow. MSCloudCAM combines a Swin Transformer backbone for hierarchical feature extraction with multi-scale context modules ASPP and PSP for enhanced scale-aware learning. A Cross-Attention block enables effective multisensor and multispectral feature fusion, while the integration of an Efficient Channel Attention Block (ECAB) and a Spatial Attention Module adaptively refine feature representations. Comprehensive experiments on CloudSEN12 and L8Biome demonstrate that MSCloudCAM delivers state-of-the-art segmentation accuracy, surpassing leading baseline architectures while maintaining competitive parameter efficiency and FLOPs. These results underscore the model's effectiveness and practicality, making it well-suited for large-scale Earth observation tasks and real-world applications.
Related papers
- Weakly Supervised Cloud Detection Combining Spectral Features and Multi-Scale Deep Network [12.520904004953344]
We propose a weakly supervised cloud detection method that combines spectral features and multi-scale scene-level deep network (SpecMCD) to obtain highly accurate pixel-level cloud masks.<n>The F1-score of the proposed SpecMCD method shows an improvement of over 7.82%, highlighting the superiority and potential of the SpecMCD method for cloud detection.
arXiv Detail & Related papers (2025-10-01T08:32:49Z) - Deep Learning for Clouds and Cloud Shadow Segmentation in Methane Satellite and Airborne Imaging Spectroscopy [8.008730702184803]
We use machine learning methods to address the cloud and cloud shadow detection problem for sensors with high resolutions instruments.<n>We deploy and evaluate conventional techniques with advanced deep learning architectures, namely UNet and a Spectral Channel Attention Network (SCAN) method.<n>Our results show that conventional methods struggle with spatial and boundary definition, affecting the detection of clouds and cloud shadows.
arXiv Detail & Related papers (2025-09-24T00:49:52Z) - SGMAGNet: A Baseline Model for 3D Cloud Phase Structure Reconstruction on a New Passive Active Satellite Benchmark [17.3424418972935]
We present a benchmark dataset for transforming satellite observations into detailed 3D cloud phase structures.<n>We adopt SGMAGNet as the main model and compare it with several baseline architectures.<n>The results demonstrate that SGMAGNet achieves superior performance in cloud phase reconstruction.
arXiv Detail & Related papers (2025-09-19T07:29:23Z) - GCRPNet: Graph-Enhanced Contextual and Regional Perception Network for Salient Object Detection in Optical Remote Sensing Images [68.33481681452675]
We propose a graph-enhanced contextual and regional perception network (GCRPNet)<n>It builds upon the Mamba architecture to simultaneously capture long-range dependencies and enhance regional feature representation.<n>It performs adaptive patch scanning on feature maps processed via multi-scale convolutions, thereby capturing rich local region information.
arXiv Detail & Related papers (2025-08-14T11:31:43Z) - Semi-supervised Semantic Segmentation for Remote Sensing Images via Multi-scale Uncertainty Consistency and Cross-Teacher-Student Attention [59.19580789952102]
This paper proposes a novel semi-supervised Multi-Scale Uncertainty and Cross-Teacher-Student Attention (MUCA) model for RS image semantic segmentation tasks.<n>MUCA constrains the consistency among feature maps at different layers of the network by introducing a multi-scale uncertainty consistency regularization.<n>MUCA utilizes a Cross-Teacher-Student attention mechanism to guide the student network, guiding the student network to construct more discriminative feature representations.
arXiv Detail & Related papers (2025-01-18T11:57:20Z) - PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection [59.355022416218624]
integration of point and voxel representations is becoming more common in LiDAR-based 3D object detection.
We propose a novel two-stage 3D object detector, called Point-Voxel Attention Fusion Network (PVAFN)
PVAFN uses a multi-pooling strategy to integrate both multi-scale and region-specific information effectively.
arXiv Detail & Related papers (2024-08-26T19:43:01Z) - Multi-view Aggregation Network for Dichotomous Image Segmentation [76.75904424539543]
Dichotomous Image (DIS) has recently emerged towards high-precision object segmentation from high-resolution natural images.
Existing methods rely on tedious multiple encoder-decoder streams and stages to gradually complete the global localization and local refinement.
Inspired by it, we model DIS as a multi-view object perception problem and provide a parsimonious multi-view aggregation network (MVANet)
Experiments on the popular DIS-5K dataset show that our MVANet significantly outperforms state-of-the-art methods in both accuracy and speed.
arXiv Detail & Related papers (2024-04-11T03:00:00Z) - MambaAD: Exploring State Space Models for Multi-class Unsupervised Anomaly Detection [53.03687787922032]
Mamba-based models with superior long-range modeling and linear efficiency have garnered substantial attention.<n>This study pioneers the application of Mamba to multi-class unsupervised anomaly detection, presenting MambaAD.<n>The proposed LSS module, integrating parallel cascaded (Hybrid State Space) HSS blocks and multi- kernel convolutions operations, effectively captures both long-range and local information.
arXiv Detail & Related papers (2024-04-09T18:28:55Z) - Distribution-aware Interactive Attention Network and Large-scale Cloud
Recognition Benchmark on FY-4A Satellite Image [24.09239785062109]
We develop a novel dataset for accurate cloud recognition.
We use domain adaptation methods to align 70,419 image-label pairs in terms of projection, temporal resolution, and spatial resolution.
We also introduce a Distribution-aware Interactive-Attention Network (DIAnet), which preserves pixel-level details through a high-resolution branch and a parallel cross-branch.
arXiv Detail & Related papers (2024-01-06T09:58:09Z) - CLiSA: A Hierarchical Hybrid Transformer Model using Orthogonal Cross
Attention for Satellite Image Cloud Segmentation [5.178465447325005]
Deep learning algorithms have emerged as promising approach to solve image segmentation problems.
In this paper, we introduce a deep-learning model for effective cloud mask generation named CLiSA - Cloud segmentation via Lipschitz Stable Attention network.
We demonstrate both qualitative and quantitative outcomes for multiple satellite image datasets including Landsat-8, Sentinel-2, and Cartosat-2s.
arXiv Detail & Related papers (2023-11-29T09:31:31Z) - BIMS-PU: Bi-Directional and Multi-Scale Point Cloud Upsampling [60.257912103351394]
We develop a new point cloud upsampling pipeline called BIMS-PU.
We decompose the up/downsampling procedure into several up/downsampling sub-steps by breaking the target sampling factor into smaller factors.
We show that our method achieves superior results to state-of-the-art approaches.
arXiv Detail & Related papers (2022-06-25T13:13:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.