Related papers: SwiftSketch: A Diffusion Model for Image-to-Vector Sketch Generation

SwiftSketch: A Diffusion Model for Image-to-Vector Sketch Generation

URL: http://arxiv.org/abs/2502.08642v1
Date: Wed, 12 Feb 2025 18:57:12 GMT
Title: SwiftSketch: A Diffusion Model for Image-to-Vector Sketch Generation
Authors: Ellie Arar, Yarden Frenkel, Daniel Cohen-Or, Ariel Shamir, Yael Vinker,
Abstract summary: We introduce SwiftSketch, a model for image-conditioned vector sketch generation that can produce high-quality sketches in less than a second.<n>SwiftSketch operates by progressively denoising stroke control points sampled from a Gaussian distribution.<n>ControlSketch is a method that enhances SDS-based techniques by incorporating precise spatial control through a depth-aware ControlNet.
Score: 57.47730473674261
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Recent advancements in large vision-language models have enabled highly expressive and diverse vector sketch generation. However, state-of-the-art methods rely on a time-consuming optimization process involving repeated feedback from a pretrained model to determine stroke placement. Consequently, despite producing impressive sketches, these methods are limited in practical applications. In this work, we introduce SwiftSketch, a diffusion model for image-conditioned vector sketch generation that can produce high-quality sketches in less than a second. SwiftSketch operates by progressively denoising stroke control points sampled from a Gaussian distribution. Its transformer-decoder architecture is designed to effectively handle the discrete nature of vector representation and capture the inherent global dependencies between strokes. To train SwiftSketch, we construct a synthetic dataset of image-sketch pairs, addressing the limitations of existing sketch datasets, which are often created by non-artists and lack professional quality. For generating these synthetic sketches, we introduce ControlSketch, a method that enhances SDS-based techniques by incorporating precise spatial control through a depth-aware ControlNet. We demonstrate that SwiftSketch generalizes across diverse concepts, efficiently producing sketches that combine high fidelity with a natural and visually appealing style.

Related papers

VideoSketcher: Video Models Prior Enable Versatile Sequential Sketch Generation [73.23035143627598]
Most generative models treat sketches as static images, overlooking the temporal structure that underlies creative drawing.<n>We present a data-efficient approach for sequential sketch generation that adapts pretrained text-to-video diffusion models.<n>Our method generates high-quality sketches that closely follow text-specified orderings while exhibiting rich visual detail.
arXiv Detail & Related papers (2026-02-17T18:55:03Z)
StableSketcher: Enhancing Diffusion Model for Pixel-based Sketch Generation via Visual Question Answering Feedback [4.851573895718146]
We propose StableSketcher, a novel framework that empowers diffusion models to generate hand-drawn sketches with high prompt fidelity.<n>We fine-tune the variational autoencoder to optimize latent decoding, enabling it to better capture the characteristics of sketches.<n>In parallel, we integrate a new reward function for reinforcement learning based on visual question answering, which improves text-image alignment and semantic consistency.
arXiv Detail & Related papers (2025-10-23T00:27:32Z)
SketchAgent: Generating Structured Diagrams from Hand-Drawn Sketches [54.06877048295693]
We introduce SketchAgent, a system designed to automate the transformation of hand-drawn sketches into structured diagrams.<n>SketchAgent integrates sketch recognition, symbolic reasoning, and iterative validation to produce semantically coherent and structurally accurate diagrams.<n>By streamlining the diagram generation process, SketchAgent holds great promise for applications in design, education, and engineering.
arXiv Detail & Related papers (2025-08-02T07:22:51Z)
CoProSketch: Controllable and Progressive Sketch Generation with Diffusion Model [18.5540421907361]
Sketches serve as fundamental blueprints in artistic creation because sketch editing is easier and more intuitive than pixel-level RGB image editing for painting artists. We propose a novel framework CoProSketch, providing prominent controllability and details for sketch generation with diffusion models. Experiments demonstrate superior semantic consistency and controllability over baselines, offering a practical solution for integrating user feedback into generative models.
arXiv Detail & Related papers (2025-04-11T05:11:17Z)
StrokeFusion: Vector Sketch Generation via Joint Stroke-UDF Encoding and Latent Sequence Diffusion [13.862427684807486]
StrokeFusion is a two-stage framework for vector sketch generation. It contains a dual-modal sketch feature learning network that maps strokes into a high-quality latent space. It exploits a stroke-level latent diffusion model that simultaneously adjusts stroke position, scale, and trajectory during generation.
arXiv Detail & Related papers (2025-03-31T06:03:03Z)
Sketch3D: Style-Consistent Guidance for Sketch-to-3D Generation [55.73399465968594]
This paper proposes a novel generation paradigm Sketch3D to generate realistic 3D assets with shape aligned with the input sketch and color matching the textual description. Three strategies are designed to optimize 3D Gaussians, i.e., structural optimization via a distribution transfer mechanism, color optimization with a straightforward MSE loss and sketch similarity optimization with a CLIP-based geometric similarity loss.
arXiv Detail & Related papers (2024-04-02T11:03:24Z)
DiffSketcher: Text Guided Vector Sketch Synthesis through Latent Diffusion Models [33.6615688030998]
DiffSketcher is an innovative algorithm that creates textitvectorized free-hand sketches using natural language input. Our experiments show that DiffSketcher achieves greater quality than prior work.
arXiv Detail & Related papers (2023-06-26T13:30:38Z)
Sketch-Guided Text-to-Image Diffusion Models [57.12095262189362]
We introduce a universal approach to guide a pretrained text-to-image diffusion model. Our method does not require to train a dedicated model or a specialized encoder for the task. We take a particular focus on the sketch-to-image translation task, revealing a robust and expressive way to generate images.
arXiv Detail & Related papers (2022-11-24T18:45:32Z)
B\'ezierSketch: A generative model for scalable vector sketches [132.5223191478268]
We present B'ezierSketch, a novel generative model for fully vector sketches that are automatically scalable and high-resolution. We first introduce a novel inverse graphics approach to stroke embedding that trains an encoder to embed each stroke to its best fit B'ezier curve. This enables us to treat sketches as short sequences of paramaterized strokes and thus train a recurrent sketch generator with greater capacity for longer sketches.
arXiv Detail & Related papers (2020-07-04T21:30:52Z)
SketchyCOCO: Image Generation from Freehand Scene Sketches [71.85577739612579]
We introduce the first method for automatic image generation from scene-level freehand sketches. Key contribution is an attribute vector bridged Geneversarative Adrial Network called EdgeGAN. We have built a large-scale composite dataset called SketchyCOCO to support and evaluate the solution.
arXiv Detail & Related papers (2020-03-05T14:54:10Z)
Sketchformer: Transformer-based Representation for Sketched Structure [12.448155157592895]
Sketchformer is a transformer-based representation for encoding free-hand sketches input in a vector form. We report several variants exploring continuous and tokenized input representations, and contrast their performance. Our learned embedding, driven by a dictionary learning tokenization scheme, yields state of the art performance in classification and image retrieval tasks.
arXiv Detail & Related papers (2020-02-24T17:11:53Z)
Deep Plastic Surgery: Robust and Controllable Image Editing with Human-Drawn Sketches [133.01690754567252]
Sketch-based image editing aims to synthesize and modify photos based on the structural information provided by the human-drawn sketches. Deep Plastic Surgery is a novel, robust and controllable image editing framework that allows users to interactively edit images using hand-drawn sketch inputs.
arXiv Detail & Related papers (2020-01-09T08:57:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.