DiffSketcher: Text Guided Vector Sketch Synthesis through Latent
Diffusion Models
- URL: http://arxiv.org/abs/2306.14685v4
- Date: Mon, 15 Jan 2024 15:20:29 GMT
- Title: DiffSketcher: Text Guided Vector Sketch Synthesis through Latent
Diffusion Models
- Authors: Ximing Xing, Chuang Wang, Haitao Zhou, Jing Zhang, Qian Yu, Dong Xu
- Abstract summary: DiffSketcher is an innovative algorithm that creates textitvectorized free-hand sketches using natural language input.
Our experiments show that DiffSketcher achieves greater quality than prior work.
- Score: 33.6615688030998
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Even though trained mainly on images, we discover that pretrained diffusion
models show impressive power in guiding sketch synthesis. In this paper, we
present DiffSketcher, an innovative algorithm that creates \textit{vectorized}
free-hand sketches using natural language input. DiffSketcher is developed
based on a pre-trained text-to-image diffusion model. It performs the task by
directly optimizing a set of B\'ezier curves with an extended version of the
score distillation sampling (SDS) loss, which allows us to use a raster-level
diffusion model as a prior for optimizing a parametric vectorized sketch
generator. Furthermore, we explore attention maps embedded in the diffusion
model for effective stroke initialization to speed up the generation process.
The generated sketches demonstrate multiple levels of abstraction while
maintaining recognizability, underlying structure, and essential visual details
of the subject drawn. Our experiments show that DiffSketcher achieves greater
quality than prior work. The code and demo of DiffSketcher can be found at
https://ximinng.github.io/DiffSketcher-project/.
Related papers
- Improving GFlowNets for Text-to-Image Diffusion Alignment [48.42367859859971]
We explore techniques that do not directly maximize the reward but rather generate high-reward images with relatively high probability.
Our method could effectively align large-scale text-to-image diffusion models with given reward information.
arXiv Detail & Related papers (2024-06-02T06:36:46Z) - FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models [56.71672127740099]
We focus on the task of image segmentation, which is traditionally solved by training models on closed-vocabulary datasets.
We leverage different and relatively small-sized, open-source foundation models for zero-shot open-vocabulary segmentation.
Our approach (dubbed FreeSeg-Diff), which does not rely on any training, outperforms many training-based approaches on both Pascal VOC and COCO datasets.
arXiv Detail & Related papers (2024-03-29T10:38:25Z) - Representative Feature Extraction During Diffusion Process for Sketch
Extraction with One Example [6.520083224801834]
We introduce DiffSketch, a method for generating a variety of stylized sketches from images.
Our approach focuses on selecting representative features from the rich semantics of deep features within a pretrained diffusion model.
arXiv Detail & Related papers (2024-01-09T05:22:15Z) - SODA: Bottleneck Diffusion Models for Representation Learning [75.7331354734152]
We introduce SODA, a self-supervised diffusion model, designed for representation learning.
The model incorporates an image encoder, which distills a source view into a compact representation, that guides the generation of related novel views.
We show that by imposing a tight bottleneck between the encoder and a denoising decoder, we can turn diffusion models into strong representation learners.
arXiv Detail & Related papers (2023-11-29T18:53:34Z) - Guided Flows for Generative Modeling and Decision Making [55.42634941614435]
We show that Guided Flows significantly improves the sample quality in conditional image generation and zero-shot text synthesis-to-speech.
Notably, we are first to apply flow models for plan generation in the offline reinforcement learning setting ax speedup in compared to diffusion models.
arXiv Detail & Related papers (2023-11-22T15:07:59Z) - Aligning Text-to-Image Diffusion Models with Reward Backpropagation [62.45086888512723]
We propose AlignProp, a method that aligns diffusion models to downstream reward functions using end-to-end backpropagation of the reward gradient.
We show AlignProp achieves higher rewards in fewer training steps than alternatives, while being conceptually simpler.
arXiv Detail & Related papers (2023-10-05T17:59:18Z) - Sketch-Guided Text-to-Image Diffusion Models [57.12095262189362]
We introduce a universal approach to guide a pretrained text-to-image diffusion model.
Our method does not require to train a dedicated model or a specialized encoder for the task.
We take a particular focus on the sketch-to-image translation task, revealing a robust and expressive way to generate images.
arXiv Detail & Related papers (2022-11-24T18:45:32Z) - B\'ezierSketch: A generative model for scalable vector sketches [132.5223191478268]
We present B'ezierSketch, a novel generative model for fully vector sketches that are automatically scalable and high-resolution.
We first introduce a novel inverse graphics approach to stroke embedding that trains an encoder to embed each stroke to its best fit B'ezier curve.
This enables us to treat sketches as short sequences of paramaterized strokes and thus train a recurrent sketch generator with greater capacity for longer sketches.
arXiv Detail & Related papers (2020-07-04T21:30:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.