Task-Driven Prompt Learning: A Joint Framework for Multi-modal Cloud Removal and Segmentation
- URL: http://arxiv.org/abs/2601.12052v1
- Date: Sat, 17 Jan 2026 13:32:38 GMT
- Title: Task-Driven Prompt Learning: A Joint Framework for Multi-modal Cloud Removal and Segmentation
- Authors: Zaiyan Zhang, Jie Li, Shaowei Shi, Qiangqiang Yuan,
- Abstract summary: TDP-CR is a task-driven framework that jointly performs cloud removal and land-cover segmentation.<n>Central to our approach is a Prompt-Guided Fusion mechanism, which utilizes a learnable degradation prompt to encode cloud thickness and spatial uncertainty.<n>Experiments on the LuojiaSET-OSFCR dataset demonstrate the superiority of our framework.
- Score: 11.468907022707013
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Optical remote sensing imagery is indispensable for Earth observation, yet persistent cloud occlusion limits its downstream utility. Most cloud removal (CR) methods are optimized for low-level fidelity and can over-smooth textures and boundaries that are critical for analysis-ready data (ARD), leading to a mismatch between visually plausible restoration and semantic utility. To bridge this gap, we propose TDP-CR, a task-driven multimodal framework that jointly performs cloud removal and land-cover segmentation. Central to our approach is a Prompt-Guided Fusion (PGF) mechanism, which utilizes a learnable degradation prompt to encode cloud thickness and spatial uncertainty. By combining global channel context with local prompt-conditioned spatial bias, PGF adaptively integrates Synthetic Aperture Radar (SAR) information only where optical data is corrupted. We further introduce a parameter-efficient two-phase training strategy that decouples reconstruction and semantic representation learning. Experiments on the LuojiaSET-OSFCR dataset demonstrate the superiority of our framework: TDP-CR surpasses heavy state-of-the-art baselines by 0.18 dB in PSNR while using only 15\% of the parameters, and achieves a 1.4\% improvement in mIoU consistently against multi-task competitors, effectively delivering analysis-ready data.
Related papers
- ConsistentRFT: Reducing Visual Hallucinations in Flow-based Reinforcement Fine-Tuning [85.20505958752928]
Reinforcement Fine-Tuning (RFT) on flow-based models is crucial for preference alignment.<n>RFT often introduce visual hallucinations like over-optimized details and semantic misalignment.<n>This work preliminarily explores why visual hallucinations arise and how to reduce them.
arXiv Detail & Related papers (2026-02-03T11:49:46Z) - Unleashing Degradation-Carrying Features in Symmetric U-Net: Simpler and Stronger Baselines for All-in-One Image Restoration [52.82397287366076]
All-in-one image restoration aims to handle diverse degradations (e.g., noise, blur, adverse weather) within a unified framework.<n>In this work, we reveal a critical insight: well-crafted feature extraction inherently encodes degradation-carrying information.<n>Our symmetric design preserves intrinsic degradation signals robustly, rendering simple additive fusion in skip connections.
arXiv Detail & Related papers (2025-12-11T12:20:31Z) - TriFusion-AE: Language-Guided Depth and LiDAR Fusion for Robust Point Cloud Processing [0.0]
Autoencoders offer a natural framework for denoising and reconstruction, but their performance degrades under challenging real-world conditions.<n>We propose TriFusion-AE, a cross-attention autoencoder that integrates textual priors, monocular depth maps from multi-view images, and LiDAR point clouds to improve robustness.<n>Our model achieves significantly more robust reconstruction under strong adversarial attacks and heavy noise, where CNN-based autoencoders collapse.
arXiv Detail & Related papers (2025-09-23T07:37:28Z) - Evaluating the Efficiency of Latent Spaces via the Coupling-Matrix [0.5013248430919224]
We introduce a redundancy index, denoted rho(C), that directly quantifies inter-dimensional dependencies.<n>Low rho(C) reliably predicts high classification accuracy or low reconstruction error, while elevated redundancy is associated with performance collapse.<n>We show that Tree-structured Parzen Estimators (TPE) preferentially explore low-rho regions, suggesting that rho(C) can guide neural architecture search and serve as a redundancy-aware regularization target.
arXiv Detail & Related papers (2025-09-08T03:36:47Z) - Graph-Based Uncertainty Modeling and Multimodal Fusion for Salient Object Detection [12.743278093269325]
We propose a dynamic uncertainty propagation and multimodal collaborative reasoning network (DUP-MCRNet)<n>DUGC is designed to propagate uncertainty between layers through a sparse graph constructed based on spatial semantic distance.<n>MCF uses learnable modality gating weights to weightedly fuse the attention maps of RGB, depth, and edge features.
arXiv Detail & Related papers (2025-08-28T04:31:48Z) - Minimal High-Resolution Patches Are Sufficient for Whole Slide Image Representation via Cascaded Dual-Scale Reconstruction [13.897013242536849]
Whole-slide image (WSI) analysis remains challenging due to gigapixel scale and sparsely distributed diagnostic regions.<n>We propose a Cascaded Dual-Scale Reconstruction framework, demonstrating that only an average of 9 high-resolution patches per WSI are sufficient for robust slide-level representation.
arXiv Detail & Related papers (2025-08-03T08:01:30Z) - FUSE: Label-Free Image-Event Joint Monocular Depth Estimation via Frequency-Decoupled Alignment and Degradation-Robust Fusion [92.4205087439928]
Image-event joint depth estimation methods leverage complementary modalities for robust perception, yet face challenges in generalizability.<n>We propose the Self-supervised Transfer (PST) and the FrequencyDe-coupled Fusion module (FreDF)<n>PST establishes cross-modal knowledge transfer through latent space alignment with image foundation models, effectively mitigating data scarcity.<n>FreDF explicitly decouples high-frequency edge features from low-frequency structural components, resolving modality-specific frequency mismatches.<n>This combined approach enables FUSE to construct a universal image-event that only requires lightweight decoder adaptation for target datasets.
arXiv Detail & Related papers (2025-03-25T15:04:53Z) - SING: Semantic Image Communications using Null-Space and INN-Guided Diffusion Models [52.40011613324083]
Joint source-channel coding systems (DeepJSCC) have recently demonstrated remarkable performance in wireless image transmission.<n>Existing methods focus on minimizing distortion between the transmitted image and the reconstructed version at the receiver, often overlooking perceptual quality.<n>We propose SING, a novel framework that formulates the recovery of high-quality images from corrupted reconstructions as an inverse problem.
arXiv Detail & Related papers (2025-03-16T12:32:11Z) - Multi-scale Restoration of Missing Data in Optical Time-series Images with Masked Spatial-Temporal Attention Network [0.6675733925327885]
Existing methods for imputing missing values in remote sensing images fail to fully exploit auxiliary information.
This paper proposes a deep learning-based novel approach called MS2 for reconstructing time-series remote sensing images.
arXiv Detail & Related papers (2024-06-19T09:05:05Z) - Magic ELF: Image Deraining Meets Association Learning and Transformer [63.761812092934576]
This paper aims to unify CNN and Transformer to take advantage of their learning merits for image deraining.
A novel multi-input attention module (MAM) is proposed to associate rain removal and background recovery.
Our proposed method (dubbed as ELF) outperforms the state-of-the-art approach (MPRNet) by 0.25 dB on average.
arXiv Detail & Related papers (2022-07-21T12:50:54Z) - Unpaired Adversarial Learning for Single Image Deraining with Rain-Space
Contrastive Constraints [61.40893559933964]
We develop an effective unpaired SID method which explores mutual properties of the unpaired exemplars by a contrastive learning manner in a GAN framework, named as CDR-GAN.
Our method performs favorably against existing unpaired deraining approaches on both synthetic and real-world datasets, even outperforms several fully-supervised or semi-supervised models.
arXiv Detail & Related papers (2021-09-07T10:00:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.