One-shot Face Sketch Synthesis in the Wild via Generative Diffusion Prior and Instruction Tuning
- URL: http://arxiv.org/abs/2506.15312v1
- Date: Wed, 18 Jun 2025 09:41:30 GMT
- Title: One-shot Face Sketch Synthesis in the Wild via Generative Diffusion Prior and Instruction Tuning
- Authors: Han Wu, Junyao Li, Kangbo Zhao, Sen Zhang, Yukai Shi, Liang Lin,
- Abstract summary: Face sketch synthesis is a technique aimed at converting face photos into sketches.<n>Existing face sketch synthesis research mainly relies on training with numerous photo-sketch sample pairs from existing datasets.<n>We propose a one-shot face sketch synthesis method based on diffusion models.
- Score: 52.0161291920299
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Face sketch synthesis is a technique aimed at converting face photos into sketches. Existing face sketch synthesis research mainly relies on training with numerous photo-sketch sample pairs from existing datasets. However, these large-scale discriminative learning methods will have to face problems such as data scarcity and high human labor costs. Once the training data becomes scarce, their generative performance significantly degrades. In this paper, we propose a one-shot face sketch synthesis method based on diffusion models. We optimize text instructions on a diffusion model using face photo-sketch image pairs. Then, the instructions derived through gradient-based optimization are used for inference. To simulate real-world scenarios more accurately and evaluate method effectiveness more comprehensively, we introduce a new benchmark named One-shot Face Sketch Dataset (OS-Sketch). The benchmark consists of 400 pairs of face photo-sketch images, including sketches with different styles and photos with different backgrounds, ages, sexes, expressions, illumination, etc. For a solid out-of-distribution evaluation, we select only one pair of images for training at each time, with the rest used for inference. Extensive experiments demonstrate that the proposed method can convert various photos into realistic and highly consistent sketches in a one-shot context. Compared to other methods, our approach offers greater convenience and broader applicability. The dataset will be available at: https://github.com/HanWu3125/OS-Sketch
Related papers
- From Easy to Hard: Building a Shortcut for Differentially Private Image Synthesis [14.93795597224185]
Differentially private (DP) image synthesis aims to generate synthetic images from a sensitive dataset.<n>We propose a two-stage DP image synthesis framework, where diffusion models learn to generate synthetic images from easy to hard.<n>We conduct experiments to present that on the average of four investigated image datasets, the fidelity and utility metrics of our synthetic images are 33.1% and 2.1% better than the state-of-the-art method.
arXiv Detail & Related papers (2025-04-02T06:30:55Z) - Stylized Face Sketch Extraction via Generative Prior with Limited Data [6.727433982111717]
StyleSketch is a method for extracting high-resolution stylized sketches from a face image.
Using the rich semantics of the deep features from a pretrained StyleGAN, we are able to train a sketch generator with 16 pairs of face and the corresponding sketch images.
arXiv Detail & Related papers (2024-03-17T16:25:25Z) - Text-to-Image Diffusion Models are Great Sketch-Photo Matchmakers [120.49126407479717]
This paper explores text-to-image diffusion models for Zero-Shot Sketch-based Image Retrieval (ZS-SBIR)
We highlight a pivotal discovery: the capacity of text-to-image diffusion models to seamlessly bridge the gap between sketches and photos.
arXiv Detail & Related papers (2024-03-12T00:02:03Z) - Symmetrical Bidirectional Knowledge Alignment for Zero-Shot Sketch-Based
Image Retrieval [69.46139774646308]
This paper studies the problem of zero-shot sketch-based image retrieval (ZS-SBIR)
It aims to use sketches from unseen categories as queries to match the images of the same category.
We propose a novel Symmetrical Bidirectional Knowledge Alignment for zero-shot sketch-based image retrieval (SBKA)
arXiv Detail & Related papers (2023-12-16T04:50:34Z) - DiffSketching: Sketch Control Image Synthesis with Diffusion Models [10.172753521953386]
Deep learning models for sketch-to-image synthesis need to overcome the distorted input sketch without visual details.
Our model matches sketches through the cross domain constraints, and uses a classifier to guide the image synthesis more accurately.
Our model can beat GAN-based method in terms of generation quality and human evaluation, and does not rely on massive sketch-image datasets.
arXiv Detail & Related papers (2023-05-30T07:59:23Z) - Text-Guided Scene Sketch-to-Photo Synthesis [5.431298869139175]
We propose a method for scene-level sketch-to-photo synthesis with text guidance.
To train our model, we use self-supervised learning from a set of photographs.
Experiments show that the proposed method translates original sketch images that are not extracted from color images into photos with compelling visual quality.
arXiv Detail & Related papers (2023-02-14T08:13:36Z) - Cap2Aug: Caption guided Image to Image data Augmentation [41.53127698828463]
Cap2Aug is an image-to-image diffusion model-based data augmentation strategy using image captions as text prompts.
We generate captions from the limited training images and using these captions edit the training images using an image-to-image stable diffusion model.
This strategy generates augmented versions of images similar to the training images yet provides semantic diversity across the samples.
arXiv Detail & Related papers (2022-12-11T04:37:43Z) - Person Image Synthesis via Denoising Diffusion Model [116.34633988927429]
We show how denoising diffusion models can be applied for high-fidelity person image synthesis.
Our results on two large-scale benchmarks and a user study demonstrate the photorealism of our proposed approach under challenging scenarios.
arXiv Detail & Related papers (2022-11-22T18:59:50Z) - Face sketch to photo translation using generative adversarial networks [1.0312968200748118]
We use a pre-trained face photo generating model to synthesize high-quality natural face photos.
We train a network to map the facial features extracted from the input sketch to a vector in the latent space of the face generating model.
The proposed model achieved 0.655 in the SSIM index and 97.59% rank-1 face recognition rate.
arXiv Detail & Related papers (2021-10-23T20:01:20Z) - Learned Spatial Representations for Few-shot Talking-Head Synthesis [68.3787368024951]
We propose a novel approach for few-shot talking-head synthesis.
We show that this disentangled representation leads to a significant improvement over previous methods.
arXiv Detail & Related papers (2021-04-29T17:59:42Z) - DeepFacePencil: Creating Face Images from Freehand Sketches [77.00929179469559]
Existing image-to-image translation methods require a large-scale dataset of paired sketches and images for supervision.
We propose DeepFacePencil, an effective tool that is able to generate photo-realistic face images from hand-drawn sketches.
arXiv Detail & Related papers (2020-08-31T03:35:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.