Related papers: One-for-All: Towards Universal Domain Translation with a Single StyleGAN

One-for-All: Towards Universal Domain Translation with a Single StyleGAN

URL: http://arxiv.org/abs/2310.14222v1
Date: Sun, 22 Oct 2023 08:02:55 GMT
Title: One-for-All: Towards Universal Domain Translation with a Single StyleGAN
Authors: Yong Du, Jiahui Zhan, Shengfeng He, Xinzhe Li, Junyu Dong, Sheng Chen, and Ming-Hsuan Yang
Abstract summary: We propose a novel translation model, UniTranslator, for transforming representations between visually distinct domains. The proposed UniTranslator is versatile and capable of performing various tasks, including style mixing, stylization, and translations. UniTranslator surpasses the performance of existing general-purpose models and performs well against specialized models in representative tasks.
Score: 86.33216867136639
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we propose a novel translation model, UniTranslator, for transforming representations between visually distinct domains under conditions of limited training data and significant visual differences. The main idea behind our approach is leveraging the domain-neutral capabilities of CLIP as a bridging mechanism, while utilizing a separate module to extract abstract, domain-agnostic semantics from the embeddings of both the source and target realms. Fusing these abstract semantics with target-specific semantics results in a transformed embedding within the CLIP space. To bridge the gap between the disparate worlds of CLIP and StyleGAN, we introduce a new non-linear mapper, the CLIP2P mapper. Utilizing CLIP embeddings, this module is tailored to approximate the latent distribution in the P space, effectively acting as a connector between these two spaces. The proposed UniTranslator is versatile and capable of performing various tasks, including style mixing, stylization, and translations, even in visually challenging scenarios across different visual domains. Notably, UniTranslator generates high-quality translations that showcase domain relevance, diversity, and improved image quality. UniTranslator surpasses the performance of existing general-purpose models and performs well against specialized models in representative tasks. The source code and trained models will be released to the public.

Related papers

Image-to-Image Translation with Diffusion Transformers and CLIP-Based Image Conditioning [2.9603070411207644]
Diffusion Transformers (DiT) is a diffusion-based framework for image-to-image translation.<n>DiT combines the denoising capabilities of diffusion models with the global modeling power of transformers.<n>We validate our approach on two benchmark datasets: face2comics, which translates real human faces to comic-style illustrations, and edges2shoes, which translates edge maps to realistic shoe images.
arXiv Detail & Related papers (2025-05-21T20:37:33Z)
FUSE-ing Language Models: Zero-Shot Adapter Discovery for Prompt Optimization Across Tokenizers [55.2480439325792]
We propose FUSE, an approach to approximating an adapter layer that maps from one model's textual embedding space to another, even across different tokenizers. We show the efficacy of our approach via multi-objective optimization over vision-language and causal language models for image captioning and sentiment-based image captioning.
arXiv Detail & Related papers (2024-08-09T02:16:37Z)
Semantic-Rearrangement-Based Multi-Level Alignment for Domain Generalized Segmentation [11.105659621713855]
We argue that different local semantic regions perform different visual characteristics from the source domain to the target domain. We propose the Semantic-Rearrangement-based Multi-Level Alignment (SRMA) to overcome this problem.
arXiv Detail & Related papers (2024-04-21T16:05:38Z)
Unified Language-driven Zero-shot Domain Adaptation [55.64088594551629]
Unified Language-driven Zero-shot Domain Adaptation (ULDA) is a novel task setting. It enables a single model to adapt to diverse target domains without explicit domain-ID knowledge.
arXiv Detail & Related papers (2024-04-10T16:44:11Z)
Split to Merge: Unifying Separated Modalities for Unsupervised Domain Adaptation [25.499205902426716]
We introduce a Unified Modality Separation (UniMoS) framework for unsupervised domain adaptation. We craft a nimble modality separation network that distinctly disentangles CLIP's features into language-associated and vision-associated components. Our proposed Modality-Ensemble Training (MET) method fosters the exchange of modality-agnostic information.
arXiv Detail & Related papers (2024-03-11T17:33:12Z)
VLLaVO: Mitigating Visual Gap through LLMs [7.352822795984628]
Cross-domain learning aims at extracting domain-invariant knowledge to reduce the domain shift between training and testing data. We propose VLLaVO, combining Vision language models and Large Language models as Visual cross-dOmain learners.
arXiv Detail & Related papers (2024-01-06T16:33:39Z)
Style-Hallucinated Dual Consistency Learning: A Unified Framework for Visual Domain Generalization [113.03189252044773]
We propose a unified framework, Style-HAllucinated Dual consistEncy learning (SHADE), to handle domain shift in various visual tasks. Our versatile SHADE can significantly enhance the generalization in various visual recognition tasks, including image classification, semantic segmentation and object detection.
arXiv Detail & Related papers (2022-12-18T11:42:51Z)
Language-aware Domain Generalization Network for Cross-Scene Hyperspectral Image Classification [15.842081807249416]
It is necessary to explore the effectiveness of linguistic mode in assisting hyperspectral image classification. Large-scale pre-training image-text foundation models have demonstrated great performance in a variety of downstream applications. A Language-aware Domain Generalization Network (LDGnet) is proposed to learn cross-domain invariant representation.
arXiv Detail & Related papers (2022-09-06T10:06:10Z)
HSVA: Hierarchical Semantic-Visual Adaptation for Zero-Shot Learning [74.76431541169342]
Zero-shot learning (ZSL) tackles the unseen class recognition problem, transferring semantic knowledge from seen classes to unseen ones. We propose a novel hierarchical semantic-visual adaptation (HSVA) framework to align semantic and visual domains. Experiments on four benchmark datasets demonstrate HSVA achieves superior performance on both conventional and generalized ZSL.
arXiv Detail & Related papers (2021-09-30T14:27:50Z)
BURT: BERT-inspired Universal Representation from Learning Meaningful Segment [46.51685959045527]
This work introduces and explores the universal representation learning, i.e., embeddings of different levels of linguistic unit in a uniform vector space. We present a universal representation model, BURT, to encode different levels of linguistic unit into the same vector space. Specifically, we extract and mask meaningful segments based on point-wise mutual information (PMI) to incorporate different granular objectives into the pre-training stage.
arXiv Detail & Related papers (2020-12-28T16:02:28Z)
Referring Image Segmentation via Cross-Modal Progressive Comprehension [94.70482302324704]
Referring image segmentation aims at segmenting the foreground masks of the entities that can well match the description given in the natural language expression. Previous approaches tackle this problem using implicit feature interaction and fusion between visual and linguistic modalities. We propose a Cross-Modal Progressive (CMPC) module and a Text-Guided Feature Exchange (TGFE) module to effectively address the challenging task.
arXiv Detail & Related papers (2020-10-01T16:02:30Z)
Universal-RCNN: Universal Object Detector via Transferable Graph R-CNN [117.80737222754306]
We present a novel universal object detector called Universal-RCNN. We first generate a global semantic pool by integrating all high-level semantic representation of all the categories. An Intra-Domain Reasoning Module learns and propagates the sparse graph representation within one dataset guided by a spatial-aware GCN.
arXiv Detail & Related papers (2020-02-18T07:57:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.