One-for-All: Towards Universal Domain Translation with a Single StyleGAN
- URL: http://arxiv.org/abs/2310.14222v1
- Date: Sun, 22 Oct 2023 08:02:55 GMT
- Title: One-for-All: Towards Universal Domain Translation with a Single StyleGAN
- Authors: Yong Du, Jiahui Zhan, Shengfeng He, Xinzhe Li, Junyu Dong, Sheng Chen,
and Ming-Hsuan Yang
- Abstract summary: We propose a novel translation model, UniTranslator, for transforming representations between visually distinct domains.
The proposed UniTranslator is versatile and capable of performing various tasks, including style mixing, stylization, and translations.
UniTranslator surpasses the performance of existing general-purpose models and performs well against specialized models in representative tasks.
- Score: 86.33216867136639
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose a novel translation model, UniTranslator, for
transforming representations between visually distinct domains under conditions
of limited training data and significant visual differences. The main idea
behind our approach is leveraging the domain-neutral capabilities of CLIP as a
bridging mechanism, while utilizing a separate module to extract abstract,
domain-agnostic semantics from the embeddings of both the source and target
realms. Fusing these abstract semantics with target-specific semantics results
in a transformed embedding within the CLIP space. To bridge the gap between the
disparate worlds of CLIP and StyleGAN, we introduce a new non-linear mapper,
the CLIP2P mapper. Utilizing CLIP embeddings, this module is tailored to
approximate the latent distribution in the P space, effectively acting as a
connector between these two spaces. The proposed UniTranslator is versatile and
capable of performing various tasks, including style mixing, stylization, and
translations, even in visually challenging scenarios across different visual
domains. Notably, UniTranslator generates high-quality translations that
showcase domain relevance, diversity, and improved image quality. UniTranslator
surpasses the performance of existing general-purpose models and performs well
against specialized models in representative tasks. The source code and trained
models will be released to the public.
Related papers
- FUSE-ing Language Models: Zero-Shot Adapter Discovery for Prompt Optimization Across Tokenizers [55.2480439325792]
We propose FUSE, an approach to approximating an adapter layer that maps from one model's textual embedding space to another, even across different tokenizers.
We show the efficacy of our approach via multi-objective optimization over vision-language and causal language models for image captioning and sentiment-based image captioning.
arXiv Detail & Related papers (2024-08-09T02:16:37Z) - Semantic-Rearrangement-Based Multi-Level Alignment for Domain Generalized Segmentation [11.105659621713855]
We argue that different local semantic regions perform different visual characteristics from the source domain to the target domain.
We propose the Semantic-Rearrangement-based Multi-Level Alignment (SRMA) to overcome this problem.
arXiv Detail & Related papers (2024-04-21T16:05:38Z) - Unified Language-driven Zero-shot Domain Adaptation [55.64088594551629]
Unified Language-driven Zero-shot Domain Adaptation (ULDA) is a novel task setting.
It enables a single model to adapt to diverse target domains without explicit domain-ID knowledge.
arXiv Detail & Related papers (2024-04-10T16:44:11Z) - Split to Merge: Unifying Separated Modalities for Unsupervised Domain
Adaptation [25.499205902426716]
We introduce a Unified Modality Separation (UniMoS) framework for unsupervised domain adaptation.
We craft a nimble modality separation network that distinctly disentangles CLIP's features into language-associated and vision-associated components.
Our proposed Modality-Ensemble Training (MET) method fosters the exchange of modality-agnostic information.
arXiv Detail & Related papers (2024-03-11T17:33:12Z) - VLLaVO: Mitigating Visual Gap through LLMs [7.352822795984628]
Cross-domain learning aims at extracting domain-invariant knowledge to reduce the domain shift between training and testing data.
We propose VLLaVO, combining Vision language models and Large Language models as Visual cross-dOmain learners.
arXiv Detail & Related papers (2024-01-06T16:33:39Z) - Style-Hallucinated Dual Consistency Learning: A Unified Framework for
Visual Domain Generalization [113.03189252044773]
We propose a unified framework, Style-HAllucinated Dual consistEncy learning (SHADE), to handle domain shift in various visual tasks.
Our versatile SHADE can significantly enhance the generalization in various visual recognition tasks, including image classification, semantic segmentation and object detection.
arXiv Detail & Related papers (2022-12-18T11:42:51Z) - Language-aware Domain Generalization Network for Cross-Scene
Hyperspectral Image Classification [15.842081807249416]
It is necessary to explore the effectiveness of linguistic mode in assisting hyperspectral image classification.
Large-scale pre-training image-text foundation models have demonstrated great performance in a variety of downstream applications.
A Language-aware Domain Generalization Network (LDGnet) is proposed to learn cross-domain invariant representation.
arXiv Detail & Related papers (2022-09-06T10:06:10Z) - BURT: BERT-inspired Universal Representation from Learning Meaningful
Segment [46.51685959045527]
This work introduces and explores the universal representation learning, i.e., embeddings of different levels of linguistic unit in a uniform vector space.
We present a universal representation model, BURT, to encode different levels of linguistic unit into the same vector space.
Specifically, we extract and mask meaningful segments based on point-wise mutual information (PMI) to incorporate different granular objectives into the pre-training stage.
arXiv Detail & Related papers (2020-12-28T16:02:28Z) - Referring Image Segmentation via Cross-Modal Progressive Comprehension [94.70482302324704]
Referring image segmentation aims at segmenting the foreground masks of the entities that can well match the description given in the natural language expression.
Previous approaches tackle this problem using implicit feature interaction and fusion between visual and linguistic modalities.
We propose a Cross-Modal Progressive (CMPC) module and a Text-Guided Feature Exchange (TGFE) module to effectively address the challenging task.
arXiv Detail & Related papers (2020-10-01T16:02:30Z) - Universal-RCNN: Universal Object Detector via Transferable Graph R-CNN [117.80737222754306]
We present a novel universal object detector called Universal-RCNN.
We first generate a global semantic pool by integrating all high-level semantic representation of all the categories.
An Intra-Domain Reasoning Module learns and propagates the sparse graph representation within one dataset guided by a spatial-aware GCN.
arXiv Detail & Related papers (2020-02-18T07:57:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.