AIR: Zero-shot Generative Model Adaptation with Iterative Refinement
- URL: http://arxiv.org/abs/2506.10895v1
- Date: Thu, 12 Jun 2025 17:00:50 GMT
- Title: AIR: Zero-shot Generative Model Adaptation with Iterative Refinement
- Authors: Guimeng Liu, Milad Abdollahzadeh, Ngai-Man Cheung,
- Abstract summary: Zero-shot generative model adaptation (ZSGM) aims to adapt a pre-trained generator to a target domain using only text guidance.<n>Central to recent ZSGM approaches are directional loss which use the text guidance in the form of aligning the image offset with text offset in the embedding space of a vision-language model like CLIP.
- Score: 27.322307161825844
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Zero-shot generative model adaptation (ZSGM) aims to adapt a pre-trained generator to a target domain using only text guidance and without any samples from the target domain. Central to recent ZSGM approaches are directional loss which use the text guidance in the form of aligning the image offset with text offset in the embedding space of a vision-language model like CLIP. This is similar to the analogical reasoning in NLP where the offset between one pair of words is used to identify a missing element in another pair by aligning the offset between these two pairs. However, a major limitation of existing ZSGM methods is that the learning objective assumes the complete alignment between image offset and text offset in the CLIP embedding space, resulting in quality degrade in generated images. Our work makes two main contributions. Inspired by the offset misalignment studies in NLP, as our first contribution, we perform an empirical study to analyze the misalignment between text offset and image offset in CLIP embedding space for various large publicly available datasets. Our important finding is that offset misalignment in CLIP embedding space is correlated with concept distance, i.e., close concepts have a less offset misalignment. To address the limitations of the current approaches, as our second contribution, we propose Adaptation with Iterative Refinement (AIR) which is the first ZSGM approach to focus on improving target domain image quality based on our new insight on offset misalignment.Qualitative, quantitative, and user study in 26 experiment setups consistently demonstrate the proposed AIR approach achieves SOTA performance. Additional experiments are in Supp.
Related papers
- Post-pre-training for Modality Alignment in Vision-Language Foundation Models [12.110530026601968]
This paper presents CLIP-Refine, a post-pre-training method for CLIP models at a phase between pre-training and fine-tuning.<n>It aims to align the feature space with 1 epoch training on small image-text datasets without zero-shot performance degradations.
arXiv Detail & Related papers (2025-04-17T07:46:19Z) - Semi-supervised Domain Adaptive Medical Image Segmentation through
Consistency Regularized Disentangled Contrastive Learning [11.049672162852733]
In this work, we investigate relatively less explored semi-supervised domain adaptation (SSDA) for medical image segmentation.
We propose a two-stage training process: first, an encoder is pre-trained in a self-learning paradigm using a novel domain-content disentangled contrastive learning (CL) along with a pixel-level feature consistency constraint.
We experimentally validate and validate our proposed method can easily be extended for UDA settings, adding to the superiority of the proposed strategy.
arXiv Detail & Related papers (2023-07-06T06:13:22Z) - Boosting Few-shot Fine-grained Recognition with Background Suppression
and Foreground Alignment [53.401889855278704]
Few-shot fine-grained recognition (FS-FGR) aims to recognize novel fine-grained categories with the help of limited available samples.
We propose a two-stage background suppression and foreground alignment framework, which is composed of a background activation suppression (BAS) module, a foreground object alignment (FOA) module, and a local to local (L2L) similarity metric.
Experiments conducted on multiple popular fine-grained benchmarks demonstrate that our method outperforms the existing state-of-the-art by a large margin.
arXiv Detail & Related papers (2022-10-04T07:54:40Z) - Refign: Align and Refine for Adaptation of Semantic Segmentation to
Adverse Conditions [78.71745819446176]
Refign is a generic extension to self-training-based UDA methods which leverages cross-domain correspondences.
Refign consists of two steps: (1) aligning the normal-condition image to the corresponding adverse-condition image using an uncertainty-aware dense matching network, and (2) refining the adverse prediction with the normal prediction using an adaptive label correction mechanism.
The approach introduces no extra training parameters, minimal computational overhead -- during training only -- and can be used as a drop-in extension to improve any given self-training-based UDA method.
arXiv Detail & Related papers (2022-07-14T11:30:38Z) - ContraCLIP: Interpretable GAN generation driven by pairs of contrasting
sentences [45.06326873752593]
We find non-linear interpretable paths in the latent space of pre-trained GANs in a model-agnostic manner.
By defining an objective that discovers paths that generate changes along the desired paths in the vision-language embedding space, we provide an intuitive way of controlling the underlying generative factors.
arXiv Detail & Related papers (2022-06-05T06:13:42Z) - BDA-SketRet: Bi-Level Domain Adaptation for Zero-Shot SBIR [52.78253400327191]
BDA-SketRet is a novel framework performing a bi-level domain adaptation for aligning the spatial and semantic features of the visual data pairs.
Experimental results on the extended Sketchy, TU-Berlin, and QuickDraw exhibit sharp improvements over the literature.
arXiv Detail & Related papers (2022-01-17T18:45:55Z) - Semi-supervised Domain Adaptive Structure Learning [72.01544419893628]
Semi-supervised domain adaptation (SSDA) is a challenging problem requiring methods to overcome both 1) overfitting towards poorly annotated data and 2) distribution shift across domains.
We introduce an adaptive structure learning method to regularize the cooperation of SSL and DA.
arXiv Detail & Related papers (2021-12-12T06:11:16Z) - HSVA: Hierarchical Semantic-Visual Adaptation for Zero-Shot Learning [74.76431541169342]
Zero-shot learning (ZSL) tackles the unseen class recognition problem, transferring semantic knowledge from seen classes to unseen ones.
We propose a novel hierarchical semantic-visual adaptation (HSVA) framework to align semantic and visual domains.
Experiments on four benchmark datasets demonstrate HSVA achieves superior performance on both conventional and generalized ZSL.
arXiv Detail & Related papers (2021-09-30T14:27:50Z) - Pixel-Level Cycle Association: A New Perspective for Domain Adaptive
Semantic Segmentation [169.82760468633236]
We propose to build the pixel-level cycle association between source and target pixel pairs.
Our method can be trained end-to-end in one stage and introduces no additional parameters.
arXiv Detail & Related papers (2020-10-31T00:11:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.