Style-Content Disentanglement in Language-Image Pretraining
Representations for Zero-Shot Sketch-to-Image Synthesis
- URL: http://arxiv.org/abs/2206.01661v1
- Date: Fri, 3 Jun 2022 16:14:37 GMT
- Title: Style-Content Disentanglement in Language-Image Pretraining
Representations for Zero-Shot Sketch-to-Image Synthesis
- Authors: Jan Zuiderveld
- Abstract summary: We show that disentangled content and style representations can be utilized to guide image generators to employ them as sketch-to-image generators without (re-)training any parameters.
Our approach for disentangling style and content entails a simple method consisting of arithmetic elementary assuming compositionality of information in representations of input sketches.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, we propose and validate a framework to leverage language-image
pretraining representations for training-free zero-shot sketch-to-image
synthesis. We show that disentangled content and style representations can be
utilized to guide image generators to employ them as sketch-to-image generators
without (re-)training any parameters. Our approach for disentangling style and
content entails a simple method consisting of elementary arithmetic assuming
compositionality of information in representations of input sketches. Our
results demonstrate that this approach is competitive with state-of-the-art
instance-level open-domain sketch-to-image models, while only depending on
pretrained off-the-shelf models and a fraction of the data.
Related papers
- DiffMorph: Text-less Image Morphing with Diffusion Models [0.0]
verb|DiffMorph| synthesizes images that mix concepts without the use of textual prompts.
verb|DiffMorph| takes an initial image with conditioning artist-drawn sketches to generate a morphed image.
We employ a pre-trained text-to-image diffusion model and fine-tune it to reconstruct each image faithfully.
arXiv Detail & Related papers (2024-01-01T12:42:32Z) - Customize StyleGAN with One Hand Sketch [0.0]
We propose a framework to control StyleGAN imagery with a single user sketch.
We learn a conditional distribution in the latent space of a pre-trained StyleGAN model via energy-based learning.
Our model can generate multi-modal images semantically aligned with the input sketch.
arXiv Detail & Related papers (2023-10-29T09:32:33Z) - DiffSketching: Sketch Control Image Synthesis with Diffusion Models [10.172753521953386]
Deep learning models for sketch-to-image synthesis need to overcome the distorted input sketch without visual details.
Our model matches sketches through the cross domain constraints, and uses a classifier to guide the image synthesis more accurately.
Our model can beat GAN-based method in terms of generation quality and human evaluation, and does not rely on massive sketch-image datasets.
arXiv Detail & Related papers (2023-05-30T07:59:23Z) - Text-Guided Scene Sketch-to-Photo Synthesis [5.431298869139175]
We propose a method for scene-level sketch-to-photo synthesis with text guidance.
To train our model, we use self-supervised learning from a set of photographs.
Experiments show that the proposed method translates original sketch images that are not extracted from color images into photos with compelling visual quality.
arXiv Detail & Related papers (2023-02-14T08:13:36Z) - Sketch-Guided Text-to-Image Diffusion Models [57.12095262189362]
We introduce a universal approach to guide a pretrained text-to-image diffusion model.
Our method does not require to train a dedicated model or a specialized encoder for the task.
We take a particular focus on the sketch-to-image translation task, revealing a robust and expressive way to generate images.
arXiv Detail & Related papers (2022-11-24T18:45:32Z) - AI Illustrator: Translating Raw Descriptions into Images by Prompt-based
Cross-Modal Generation [61.77946020543875]
We propose a framework for translating raw descriptions with complex semantics into semantically corresponding images.
Our framework consists of two components: a projection module from Text Embeddings to Image Embeddings based on prompts, and an adapted image generation module built on StyleGAN.
Benefiting from the pre-trained models, our method can handle complex descriptions and does not require external paired data for training.
arXiv Detail & Related papers (2022-09-07T13:53:54Z) - More Control for Free! Image Synthesis with Semantic Diffusion Guidance [79.88929906247695]
Controllable image synthesis models allow creation of diverse images based on text instructions or guidance from an example image.
We introduce a novel unified framework for semantic diffusion guidance, which allows either language or image guidance, or both.
We conduct experiments on FFHQ and LSUN datasets, and show results on fine-grained text-guided image synthesis.
arXiv Detail & Related papers (2021-12-10T18:55:50Z) - SketchEdit: Mask-Free Local Image Manipulation with Partial Sketches [95.45728042499836]
We propose a new paradigm of sketch-based image manipulation: mask-free local image manipulation.
Our model automatically predicts the target modification region and encodes it into a structure style vector.
A generator then synthesizes the new image content based on the style vector and sketch.
arXiv Detail & Related papers (2021-11-30T02:42:31Z) - Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic [72.60554897161948]
Recent text-to-image matching models apply contrastive learning to large corpora of uncurated pairs of images and sentences.
In this work, we repurpose such models to generate a descriptive text given an image at inference time.
The resulting captions are much less restrictive than those obtained by supervised captioning methods.
arXiv Detail & Related papers (2021-11-29T11:01:49Z) - Zero-Shot Text-to-Image Generation [15.135825501365007]
We describe a transformer that autoregressively models the text and image tokens as a single stream of data.
With sufficient data and scale, our approach is competitive with previous domain-specific models when evaluated in a zero-shot fashion.
arXiv Detail & Related papers (2021-02-24T06:42:31Z) - SketchyCOCO: Image Generation from Freehand Scene Sketches [71.85577739612579]
We introduce the first method for automatic image generation from scene-level freehand sketches.
Key contribution is an attribute vector bridged Geneversarative Adrial Network called EdgeGAN.
We have built a large-scale composite dataset called SketchyCOCO to support and evaluate the solution.
arXiv Detail & Related papers (2020-03-05T14:54:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.