CLiSA: A Hierarchical Hybrid Transformer Model using Orthogonal Cross
Attention for Satellite Image Cloud Segmentation
- URL: http://arxiv.org/abs/2311.17475v2
- Date: Fri, 1 Dec 2023 10:43:18 GMT
- Title: CLiSA: A Hierarchical Hybrid Transformer Model using Orthogonal Cross
Attention for Satellite Image Cloud Segmentation
- Authors: Subhajit Paul, Ashutosh Gupta
- Abstract summary: Deep learning algorithms have emerged as promising approach to solve image segmentation problems.
In this paper, we introduce a deep-learning model for effective cloud mask generation named CLiSA - Cloud segmentation via Lipschitz Stable Attention network.
We demonstrate both qualitative and quantitative outcomes for multiple satellite image datasets including Landsat-8, Sentinel-2, and Cartosat-2s.
- Score: 5.178465447325005
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Clouds in optical satellite images are a major concern since their presence
hinders the ability to carry accurate analysis as well as processing. Presence
of clouds also affects the image tasking schedule and results in wastage of
valuable storage space on ground as well as space-based systems. Due to these
reasons, deriving accurate cloud masks from optical remote-sensing images is an
important task. Traditional methods such as threshold-based, spatial filtering
for cloud detection in satellite images suffer from lack of accuracy. In recent
years, deep learning algorithms have emerged as a promising approach to solve
image segmentation problems as it allows pixel-level classification and
semantic-level segmentation. In this paper, we introduce a deep-learning model
based on hybrid transformer architecture for effective cloud mask generation
named CLiSA - Cloud segmentation via Lipschitz Stable Attention network. In
this context, we propose an concept of orthogonal self-attention combined with
hierarchical cross attention model, and we validate its Lipschitz stability
theoretically and empirically. We design the whole setup under adversarial
setting in presence of Lov\'asz-Softmax loss. We demonstrate both qualitative
and quantitative outcomes for multiple satellite image datasets including
Landsat-8, Sentinel-2, and Cartosat-2s. Performing comparative study we show
that our model performs preferably against other state-of-the-art methods and
also provides better generalization in precise cloud extraction from satellite
multi-spectral (MX) images. We also showcase different ablation studies to
endorse our choices corresponding to different architectural elements and
objective functions.
Related papers
- Edge-Cloud Collaborative Satellite Image Analysis for Efficient Man-Made Structure Recognition [2.110762118285028]
The paper presents a new satellite image processing architecture combining edge and cloud computing.
By employing lightweight models at the edge, the system initially identifies potential man-made structures from satellite imagery.
These identified images are then transmitted to the cloud, where a more complex model refines the classification.
arXiv Detail & Related papers (2024-10-08T03:31:32Z) - Point Cloud Compression with Implicit Neural Representations: A Unified Framework [54.119415852585306]
We present a pioneering point cloud compression framework capable of handling both geometry and attribute components.
Our framework utilizes two coordinate-based neural networks to implicitly represent a voxelized point cloud.
Our method exhibits high universality when contrasted with existing learning-based techniques.
arXiv Detail & Related papers (2024-05-19T09:19:40Z) - SatSynth: Augmenting Image-Mask Pairs through Diffusion Models for Aerial Semantic Segmentation [69.42764583465508]
We explore the potential of generative image diffusion to address the scarcity of annotated data in earth observation tasks.
To the best of our knowledge, we are the first to generate both images and corresponding masks for satellite segmentation.
arXiv Detail & Related papers (2024-03-25T10:30:22Z) - DiffCR: A Fast Conditional Diffusion Framework for Cloud Removal from
Optical Satellite Images [27.02507384522271]
This paper presents a novel framework called DiffCR, which leverages conditional guided diffusion with deep convolutional networks for high-performance cloud removal for optical satellite imagery.
We introduce a decoupled encoder for conditional image feature extraction, providing a robust color representation to ensure the close similarity of appearance information between the conditional input and the synthesized output.
arXiv Detail & Related papers (2023-08-08T17:34:28Z) - Detecting Cloud Presence in Satellite Images Using the RGB-based CLIP
Vision-Language Model [0.0]
This work explores capabilities of the pre-trained CLIP vision-language model to identify satellite images affected by clouds.
Several approaches to using the model to perform cloud presence detection are proposed and evaluated.
Results demonstrate that the representations learned by the CLIP model can be useful for satellite image processing tasks involving clouds.
arXiv Detail & Related papers (2023-08-01T13:36:46Z) - Exploring the Application of Large-scale Pre-trained Models on Adverse
Weather Removal [97.53040662243768]
We propose a CLIP embedding module to make the network handle different weather conditions adaptively.
This module integrates the sample specific weather prior extracted by CLIP image encoder together with the distribution specific information learned by a set of parameters.
arXiv Detail & Related papers (2023-06-15T10:06:13Z) - Cloud removal Using Atmosphere Model [7.259230333873744]
Cloud removal is an essential task in remote sensing data analysis.
We propose to use scattering model for temporal sequence of images of any scene in the framework of low rank and sparse models.
We develop a semi-realistic simulation method to produce cloud cover so that various methods can be quantitatively analysed.
arXiv Detail & Related papers (2022-10-05T01:29:19Z) - SatMAE: Pre-training Transformers for Temporal and Multi-Spectral
Satellite Imagery [74.82821342249039]
We present SatMAE, a pre-training framework for temporal or multi-spectral satellite imagery based on Masked Autoencoder (MAE)
To leverage temporal information, we include a temporal embedding along with independently masking image patches across time.
arXiv Detail & Related papers (2022-07-17T01:35:29Z) - Beyond Cross-view Image Retrieval: Highly Accurate Vehicle Localization
Using Satellite Image [91.29546868637911]
This paper addresses the problem of vehicle-mounted camera localization by matching a ground-level image with an overhead-view satellite map.
The key idea is to formulate the task as pose estimation and solve it by neural-net based optimization.
Experiments on standard autonomous vehicle localization datasets have confirmed the superiority of the proposed method.
arXiv Detail & Related papers (2022-04-10T19:16:58Z) - CAMERAS: Enhanced Resolution And Sanity preserving Class Activation
Mapping for image saliency [61.40511574314069]
Backpropagation image saliency aims at explaining model predictions by estimating model-centric importance of individual pixels in the input.
We propose CAMERAS, a technique to compute high-fidelity backpropagation saliency maps without requiring any external priors.
arXiv Detail & Related papers (2021-06-20T08:20:56Z) - Multi-scale Cloud Detection in Remote Sensing Images using a Dual
Convolutional Neural Network [4.812718493682455]
CNN has advanced the state of the art in pixel-level classification of remote sensing images.
We propose an architecture of two cascaded CNN model components successively processing undersampled and full resolution images.
We achieve a 16% relative improvement in pixel accuracy over a CNN baseline based on patching.
arXiv Detail & Related papers (2020-06-01T10:27:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.