Related papers: Quantitative Comparison of Fine-Tuning Techniques for Pretrained Latent Diffusion Models in the Generation of Unseen SAR Image Concepts

Quantitative Comparison of Fine-Tuning Techniques for Pretrained Latent Diffusion Models in the Generation of Unseen SAR Image Concepts

URL: http://arxiv.org/abs/2506.13307v1
Date: Mon, 16 Jun 2025 09:48:01 GMT
Title: Quantitative Comparison of Fine-Tuning Techniques for Pretrained Latent Diffusion Models in the Generation of Unseen SAR Image Concepts
Authors: Solène Debuysère, Nicolas Trouvé, Nathan Letheule, Olivier Lévêque, Elise Colin,
Abstract summary: This work investigates the adaptation of large pre-trained latent diffusion models to a radically new imaging domain: Synthetic Aperture Radar (SAR)<n>We explore and compare multiple fine-tuning strategies, including full model fine-tuning and parameter-efficient approaches like Low-Rank Adaptation (LoRA)<n>Our results show that a hybrid tuning strategy yields the best performance, while LoRA-based partial tuning of the text encoder, combined with embedding learning of the SAR> token, suffices to preserve prompt alignment.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This work investigates the adaptation of large pre-trained latent diffusion models to a radically new imaging domain: Synthetic Aperture Radar (SAR). While these generative models, originally trained on natural images, demonstrate impressive capabilities in text-to-image synthesis, they are not natively adapted to represent SAR data, which involves different physics, statistical distributions, and visual characteristics. Using a sizeable SAR dataset (on the order of 100,000 to 1 million images), we address the fundamental question of fine-tuning such models for this unseen modality. We explore and compare multiple fine-tuning strategies, including full model fine-tuning and parameter-efficient approaches like Low-Rank Adaptation (LoRA), focusing separately on the UNet diffusion backbone and the text encoder components. To evaluate generative quality, we combine several metrics: statistical distance from real SAR distributions, textural similarity via GLCM descriptors, and semantic alignment assessed with a CLIP model fine-tuned on SAR data. Our results show that a hybrid tuning strategy yields the best performance: full fine-tuning of the UNet is better at capturing low-level SAR-specific patterns, while LoRA-based partial tuning of the text encoder, combined with embedding learning of the <SAR> token, suffices to preserve prompt alignment. This work provides a methodical strategy for adapting foundation models to unconventional imaging modalities beyond natural image domains.

Related papers

Knowledge-guided Complex Diffusion Model for PolSAR Image Classification in Contourlet Domain [58.46450049579116]
We propose a knowledge-guided complex diffusion model for PolSAR image classification in the Contourlet domain.<n> Specifically, the Contourlet transform is first applied to decompose the data into low- and high-frequency subbands.<n>A knowledge-guided complex diffusion network is then designed to model the statistical properties of the low-frequency components.
arXiv Detail & Related papers (2025-07-08T04:50:28Z)
Dataset Distillation with Probabilistic Latent Features [9.318549327568695]
A compact set of synthetic data can effectively replace the original dataset in downstream classification tasks.<n>We propose a novel approach that models the joint distribution of latent features.<n>Our method achieves state-of-the-art cross architecture performance across a range of backbone architectures.
arXiv Detail & Related papers (2025-05-10T13:53:49Z)
PromptMID: Modal Invariant Descriptors Based on Diffusion and Vision Foundation Models for Optical-SAR Image Matching [15.840638449527399]
We propose PromptMID, a novel approach that constructs modality-invariant descriptors using text prompts.<n>PromptMID extracts multi-scale modality-invariant features by leveraging pre-trained diffusion models and visual foundation models.<n>Experiments on optical-SAR image datasets from four diverse regions demonstrate that PromptMID outperforms state-of-the-art matching methods.
arXiv Detail & Related papers (2025-02-25T11:19:26Z)
Conditional Brownian Bridge Diffusion Model for VHR SAR to Optical Image Translation [5.578820789388206]
This letter introduces a conditional image-to-image translation approach based on Brownian Bridge Diffusion Model (BBDM)<n>We conducted comprehensive experiments on the MSAW dataset, a paired SAR and optical images collection of 0.5m Very-High-Resolution (VHR)
arXiv Detail & Related papers (2024-08-15T05:43:46Z)
Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation [63.15257949821558]
Referring Remote Sensing Image (RRSIS) is a new challenge that combines computer vision and natural language processing. Traditional Referring Image (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery. We introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS.
arXiv Detail & Related papers (2023-12-19T08:14:14Z)
Diversified in-domain synthesis with efficient fine-tuning for few-shot classification [64.86872227580866]
Few-shot image classification aims to learn an image classifier using only a small set of labeled examples per class. We propose DISEF, a novel approach which addresses the generalization challenge in few-shot learning using synthetic data. We validate our method in ten different benchmarks, consistently outperforming baselines and establishing a new state-of-the-art for few-shot classification.
arXiv Detail & Related papers (2023-12-05T17:18:09Z)
ExposureDiffusion: Learning to Expose for Low-light Image Enhancement [87.08496758469835]
This work addresses the issue by seamlessly integrating a diffusion model with a physics-based exposure model. Our method obtains significantly improved performance and reduced inference time compared with vanilla diffusion models. The proposed framework can work with both real-paired datasets, SOTA noise models, and different backbone networks.
arXiv Detail & Related papers (2023-07-15T04:48:35Z)
Lafite2: Few-shot Text-to-Image Generation [132.14211027057766]
We propose a novel method for pre-training text-to-image generation model on image-only datasets. It considers a retrieval-then-optimization procedure to synthesize pseudo text features. It can be beneficial to a wide range of settings, including the few-shot, semi-supervised and fully-supervised learning.
arXiv Detail & Related papers (2022-10-25T16:22:23Z)
Semantic Image Synthesis via Diffusion Models [174.24523061460704]
Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks.<n>Recent work on semantic image synthesis mainly follows the de facto GAN-based approaches.<n>We propose a novel framework based on DDPM for semantic image synthesis.
arXiv Detail & Related papers (2022-06-30T18:31:51Z)
A Feature Fusion-Net Using Deep Spatial Context Encoder and Nonstationary Joint Statistical Model for High Resolution SAR Image Classification [10.152675581771113]
A novel end-to-end supervised classification method is proposed for HR SAR images. To extract more effective spatial features, a new deep spatial context encoder network (DSCEN) is proposed. To enhance the diversity of statistics, the nonstationary joint statistical model (NS-JSM) is adopted to form the global statistical features.
arXiv Detail & Related papers (2021-05-11T06:20:14Z)
Sparse Signal Models for Data Augmentation in Deep Learning ATR [0.8999056386710496]
We propose a data augmentation approach to incorporate domain knowledge and improve the generalization power of a data-intensive learning algorithm. We exploit the sparsity of the scattering centers in the spatial domain and the smoothly-varying structure of the scattering coefficients in the azimuthal domain to solve the ill-posed problem of over-parametrized model fitting.
arXiv Detail & Related papers (2020-12-16T21:46:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.