Customize StyleGAN with One Hand Sketch
- URL: http://arxiv.org/abs/2310.18949v1
- Date: Sun, 29 Oct 2023 09:32:33 GMT
- Title: Customize StyleGAN with One Hand Sketch
- Authors: Shaocong Zhang
- Abstract summary: We propose a framework to control StyleGAN imagery with a single user sketch.
We learn a conditional distribution in the latent space of a pre-trained StyleGAN model via energy-based learning.
Our model can generate multi-modal images semantically aligned with the input sketch.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generating images from human sketches typically requires dedicated networks
trained from scratch. In contrast, the emergence of the pre-trained
Vision-Language models (e.g., CLIP) has propelled generative applications based
on controlling the output imagery of existing StyleGAN models with text inputs
or reference images. Parallelly, our work proposes a framework to control
StyleGAN imagery with a single user sketch. In particular, we learn a
conditional distribution in the latent space of a pre-trained StyleGAN model
via energy-based learning and propose two novel energy functions leveraging
CLIP for cross-domain semantic supervision. Once trained, our model can
generate multi-modal images semantically aligned with the input sketch.
Quantitative evaluations on synthesized datasets have shown that our approach
improves significantly from previous methods in the one-shot regime. The
superiority of our method is further underscored when experimenting with a wide
range of human sketches of diverse styles and poses. Surprisingly, our models
outperform the previous baseline regarding both the range of sketch inputs and
image qualities despite operating with a stricter setting: with no extra
training data and single sketch input.
Related papers
- JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation [49.997839600988875]
Existing personalization methods rely on finetuning a text-to-image foundation model on a user's custom dataset.
We propose Joint-Image Diffusion (jedi), an effective technique for learning a finetuning-free personalization model.
Our model achieves state-of-the-art generation quality, both quantitatively and qualitatively, significantly outperforming both the prior finetuning-based and finetuning-free personalization baselines.
arXiv Detail & Related papers (2024-07-08T17:59:02Z) - Pick-and-Draw: Training-free Semantic Guidance for Text-to-Image
Personalization [56.12990759116612]
Pick-and-Draw is a training-free semantic guidance approach to boost identity consistency and generative diversity for personalization methods.
The proposed approach can be applied to any personalized diffusion models and requires as few as a single reference image.
arXiv Detail & Related papers (2024-01-30T05:56:12Z) - DiffSketching: Sketch Control Image Synthesis with Diffusion Models [10.172753521953386]
Deep learning models for sketch-to-image synthesis need to overcome the distorted input sketch without visual details.
Our model matches sketches through the cross domain constraints, and uses a classifier to guide the image synthesis more accurately.
Our model can beat GAN-based method in terms of generation quality and human evaluation, and does not rely on massive sketch-image datasets.
arXiv Detail & Related papers (2023-05-30T07:59:23Z) - Reference-based Image Composition with Sketch via Structure-aware
Diffusion Model [38.1193912666578]
We introduce a multi-input-conditioned image composition model that incorporates a sketch as a novel modal, alongside a reference image.
Thanks to the edge-level controllability using sketches, our method enables a user to edit or complete an image sub-part.
Our framework fine-tunes a pre-trained diffusion model to complete missing regions using the reference image while maintaining sketch guidance.
arXiv Detail & Related papers (2023-03-31T06:12:58Z) - Sketch-Guided Text-to-Image Diffusion Models [57.12095262189362]
We introduce a universal approach to guide a pretrained text-to-image diffusion model.
Our method does not require to train a dedicated model or a specialized encoder for the task.
We take a particular focus on the sketch-to-image translation task, revealing a robust and expressive way to generate images.
arXiv Detail & Related papers (2022-11-24T18:45:32Z) - Style-Content Disentanglement in Language-Image Pretraining
Representations for Zero-Shot Sketch-to-Image Synthesis [0.0]
We show that disentangled content and style representations can be utilized to guide image generators to employ them as sketch-to-image generators without (re-)training any parameters.
Our approach for disentangling style and content entails a simple method consisting of arithmetic elementary assuming compositionality of information in representations of input sketches.
arXiv Detail & Related papers (2022-06-03T16:14:37Z) - Meta Internal Learning [88.68276505511922]
Internal learning for single-image generation is a framework, where a generator is trained to produce novel images based on a single image.
We propose a meta-learning approach that enables training over a collection of images, in order to model the internal statistics of the sample image more effectively.
Our results show that the models obtained are as suitable as single-image GANs for many common image applications.
arXiv Detail & Related papers (2021-10-06T16:27:38Z) - Sketch Your Own GAN [36.77647431087615]
We present a method, GAN Sketching, for rewriting GANs with one or more sketches.
We encourage the model's output to match the user sketches through a cross-domain adversarial loss.
Experiments have shown that our method can mold GANs to match shapes and poses specified by sketches while maintaining realism and diversity.
arXiv Detail & Related papers (2021-08-05T17:59:42Z) - StyleMeUp: Towards Style-Agnostic Sketch-Based Image Retrieval [119.03470556503942]
Crossmodal matching problem is typically solved by learning a joint embedding space where semantic content shared between photo and sketch modalities are preserved.
An effective model needs to explicitly account for this style diversity, crucially, to unseen user styles.
Our model can not only disentangle the cross-modal shared semantic content, but can adapt the disentanglement to any unseen user style as well, making the model truly agnostic.
arXiv Detail & Related papers (2021-03-29T15:44:19Z) - Deep Plastic Surgery: Robust and Controllable Image Editing with
Human-Drawn Sketches [133.01690754567252]
Sketch-based image editing aims to synthesize and modify photos based on the structural information provided by the human-drawn sketches.
Deep Plastic Surgery is a novel, robust and controllable image editing framework that allows users to interactively edit images using hand-drawn sketch inputs.
arXiv Detail & Related papers (2020-01-09T08:57:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.