Related papers: Image-Based CLIP-Guided Essence Transfer

Image-Based CLIP-Guided Essence Transfer

URL: http://arxiv.org/abs/2110.12427v2
Date: Tue, 26 Oct 2021 06:31:25 GMT
Title: Image-Based CLIP-Guided Essence Transfer
Authors: Hila Chefer, Sagie Benaim, Roni Paiss, Lior Wolf
Abstract summary: blending of two signals is a semantic task that may underline both creativity and intelligence. We propose to perform such blending in a way that incorporates two latent spaces: that of the generator network and that of the semantic network.
Score: 83.09110547792103
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: The conceptual blending of two signals is a semantic task that may underline both creativity and intelligence. We propose to perform such blending in a way that incorporates two latent spaces: that of the generator network and that of the semantic network. For the first network, we employ the powerful StyleGAN generator, and for the second, the powerful image-language matching network of CLIP. The new method creates a blending operator that is optimized to be simultaneously additive in both latent spaces. Our results demonstrate that this leads to blending that is much more natural than what can be obtained in each space separately.

Related papers

I2I-Galip: Unsupervised Medical Image Translation Using Generative Adversarial CLIP [30.506544165999564]
Unpaired image-to-image translation is a challenging task due to the absence of paired examples. We propose a new image-to-image translation framework named Image-to-Image-Generative-Adversarial-CLIP (I2I-Galip)
arXiv Detail & Related papers (2024-09-19T01:44:50Z)
Double-Shot 3D Shape Measurement with a Dual-Branch Network [14.749887303860717]
We propose a dual-branch Convolutional Neural Network (CNN)-Transformer network (PDCNet) to process different structured light (SL) modalities. Within PDCNet, a Transformer branch is used to capture global perception in the fringe images, while a CNN branch is designed to collect local details in the speckle images. We show that our method can reduce fringe order ambiguity while producing high-accuracy results on a self-made dataset.
arXiv Detail & Related papers (2024-07-19T10:49:26Z)
Gradient Adjusting Networks for Domain Inversion [82.72289618025084]
StyleGAN2 was demonstrated to be a powerful image generation engine that supports semantic editing. We present a per-image optimization method that tunes a StyleGAN2 generator such that it achieves a local edit to the generator's weights. Our experiments show a sizable gap in performance over the current state of the art in this very active domain.
arXiv Detail & Related papers (2023-02-22T14:47:57Z)
CLIP2GAN: Towards Bridging Text with the Latent Space of GANs [128.47600914674985]
We propose a novel framework, i.e., CLIP2GAN, by leveraging CLIP model and StyleGAN. The key idea of our CLIP2GAN is to bridge the output feature embedding space of CLIP and the input latent space of StyleGAN.
arXiv Detail & Related papers (2022-11-28T04:07:17Z)
Local and Global GANs with Semantic-Aware Upsampling for Image Generation [201.39323496042527]
We consider generating images using local context. We propose a class-specific generative network using semantic maps as guidance. Lastly, we propose a novel semantic-aware upsampling method.
arXiv Detail & Related papers (2022-02-28T19:24:25Z)
A Unified Architecture of Semantic Segmentation and Hierarchical Generative Adversarial Networks for Expression Manipulation [52.911307452212256]
We develop a unified architecture of semantic segmentation and hierarchical GANs. A unique advantage of our framework is that on forward pass the semantic segmentation network conditions the generative model. We evaluate our method on two challenging facial expression translation benchmarks, AffectNet and RaFD, and a semantic segmentation benchmark, CelebAMask-HQ.
arXiv Detail & Related papers (2021-12-08T22:06:31Z)
DiverGAN: An Efficient and Effective Single-Stage Framework for Diverse Text-to-Image Generation [7.781425222538382]
DiverGAN is a framework to generate diverse, plausible and semantically consistent images according to a natural-language description. DiverGAN adopts two novel word-level attention modules, i.e., a channel-attention module (CAM) and a pixel-attention module (PAM) Conditional Adaptive Instance-Layer Normalization (CAdaILN) is introduced to enable the linguistic cues from the sentence embedding to flexibly manipulate the amount of change in shape and texture.
arXiv Detail & Related papers (2021-11-17T17:59:56Z)
SegMix: Co-occurrence Driven Mixup for Semantic Segmentation and Adversarial Robustness [29.133980156068482]
We present a strategy for training convolutional neural networks to effectively resolve interference arising from competing hypotheses. The premise is based on the notion of feature binding, which is defined as the process by which activations spread across space and layers in the network are successfully integrated to arrive at a correct inference decision.
arXiv Detail & Related papers (2021-08-23T04:35:48Z)
Linear Semantics in Generative Adversarial Networks [26.123252503846942]
We aim to better understand the semantic representation of GANs, and enable semantic control in GAN's generation process. We find that a well-trained GAN encodes image semantics in its internal feature maps in a surprisingly simple way. We propose two few-shot image editing approaches, namely Semantic-Conditional Sampling and Semantic Image Editing.
arXiv Detail & Related papers (2021-04-01T14:18:48Z)
Feature Sharing Cooperative Network for Semantic Segmentation [10.305130700118399]
We propose a semantic segmentation method using cooperative learning. By sharing feature maps, one of two networks can obtain the information that cannot be obtained by a single network. The proposed method achieved better segmentation accuracy than the conventional single network and ensemble of networks.
arXiv Detail & Related papers (2021-01-20T00:22:00Z)
OneGAN: Simultaneous Unsupervised Learning of Conditional Image Generation, Foreground Segmentation, and Fine-Grained Clustering [100.32273175423146]
We present a method for simultaneously learning, in an unsupervised manner, a conditional image generator, foreground extraction and segmentation, and object removal and background completion. The method combines a Geneversarative Adrial Network and a Variational Auto-Encoder, with multiple encoders, generators and discriminators, and benefits from solving all tasks at once.
arXiv Detail & Related papers (2019-12-31T18:15:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.