StrokeFusion: Vector Sketch Generation via Joint Stroke-UDF Encoding and Latent Sequence Diffusion
- URL: http://arxiv.org/abs/2503.23752v2
- Date: Thu, 17 Apr 2025 03:17:42 GMT
- Title: StrokeFusion: Vector Sketch Generation via Joint Stroke-UDF Encoding and Latent Sequence Diffusion
- Authors: Jin Zhou, Yi Zhou, Pengfei Xu, Hui Huang,
- Abstract summary: StrokeFusion is a two-stage framework for vector sketch generation.<n>It contains a dual-modal sketch feature learning network that maps strokes into a high-quality latent space.<n>It exploits a stroke-level latent diffusion model that simultaneously adjusts stroke position, scale, and trajectory during generation.
- Score: 13.862427684807486
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the field of sketch generation, raster-format trained models often produce non-stroke artifacts, while vector-format trained models typically lack a holistic understanding of sketches, leading to compromised recognizability. Moreover, existing methods struggle to extract common features from similar elements (e.g., eyes of animals) appearing at varying positions across sketches. To address these challenges, we propose StrokeFusion, a two-stage framework for vector sketch generation. It contains a dual-modal sketch feature learning network that maps strokes into a high-quality latent space. This network decomposes sketches into normalized strokes and jointly encodes stroke sequences with Unsigned Distance Function (UDF) maps, representing sketches as sets of stroke feature vectors. Building upon this representation, our framework exploits a stroke-level latent diffusion model that simultaneously adjusts stroke position, scale, and trajectory during generation. This enables high-fidelity sketch generation while supporting stroke interpolation editing. Extensive experiments on the QuickDraw dataset demonstrate that our framework outperforms state-of-the-art techniques, validating its effectiveness in preserving structural integrity and semantic features. Code and models will be made publicly available upon publication.
Related papers
- CoProSketch: Controllable and Progressive Sketch Generation with Diffusion Model [18.5540421907361]
Sketches serve as fundamental blueprints in artistic creation because sketch editing is easier and more intuitive than pixel-level RGB image editing for painting artists.
We propose a novel framework CoProSketch, providing prominent controllability and details for sketch generation with diffusion models.
Experiments demonstrate superior semantic consistency and controllability over baselines, offering a practical solution for integrating user feedback into generative models.
arXiv Detail & Related papers (2025-04-11T05:11:17Z) - SwiftSketch: A Diffusion Model for Image-to-Vector Sketch Generation [57.47730473674261]
We introduce SwiftSketch, a model for image-conditioned vector sketch generation that can produce high-quality sketches in less than a second.<n>SwiftSketch operates by progressively denoising stroke control points sampled from a Gaussian distribution.<n>ControlSketch is a method that enhances SDS-based techniques by incorporating precise spatial control through a depth-aware ControlNet.
arXiv Detail & Related papers (2025-02-12T18:57:12Z) - VQ-SGen: A Vector Quantized Stroke Representation for Creative Sketch Generation [12.486307321835909]
VQ-SGen is a novel algorithm for high-quality creative sketch generation.<n>We introduce a vector-quantized (VQ) stroke representation for fine-grained sketch generation.<n>Our method surpasses existing state-of-the-art techniques on the CreativeSketch dataset.
arXiv Detail & Related papers (2024-11-25T14:51:22Z) - Block and Detail: Scaffolding Sketch-to-Image Generation [65.56590359051634]
We introduce a novel sketch-to-image tool that aligns with the iterative refinement process of artists.
Our tool lets users sketch blocking strokes to coarsely represent the placement and form of objects and detail strokes to refine their shape and silhouettes.
We develop a two-pass algorithm for generating high-fidelity images from such sketches at any point in the iterative process.
arXiv Detail & Related papers (2024-02-28T07:09:31Z) - Sketch Video Synthesis [52.134906766625164]
We propose a novel framework for sketching videos represented by the frame-wise B'ezier curve.
Our method unlocks applications in sketch-based video editing and video doodling, enabled through video composition.
arXiv Detail & Related papers (2023-11-26T14:14:04Z) - Bridging the Gap: Sketch-Aware Interpolation Network for High-Quality Animation Sketch Inbetweening [58.09847349781176]
We propose a novel deep learning method - Sketch-Aware Interpolation Network (SAIN)
This approach incorporates multi-level guidance that formulates region-level correspondence, stroke-level correspondence and pixel-level dynamics.
A multi-stream U-Transformer is then devised to characterize sketch inbetweening patterns using these multi-level guides through the integration of self / cross-attention mechanisms.
arXiv Detail & Related papers (2023-08-25T09:51:03Z) - DiffFaceSketch: High-Fidelity Face Image Synthesis with Sketch-Guided
Latent Diffusion Model [8.1818090854822]
We introduce a Sketch-Guided Latent Diffusion Model (SGLDM), an LDM-based network architect trained on a paired sketch-face dataset.
SGLDM can synthesize high-quality face images with different expressions, facial accessories, and hairstyles from various sketches with different abstraction levels.
arXiv Detail & Related papers (2023-02-14T08:51:47Z) - On Learning Semantic Representations for Million-Scale Free-Hand
Sketches [146.52892067335128]
We study learning semantic representations for million-scale free-hand sketches.
We propose a dual-branch CNNRNN network architecture to represent sketches.
We explore learning the sketch-oriented semantic representations in hashing retrieval and zero-shot recognition.
arXiv Detail & Related papers (2020-07-07T15:23:22Z) - B\'ezierSketch: A generative model for scalable vector sketches [132.5223191478268]
We present B'ezierSketch, a novel generative model for fully vector sketches that are automatically scalable and high-resolution.
We first introduce a novel inverse graphics approach to stroke embedding that trains an encoder to embed each stroke to its best fit B'ezier curve.
This enables us to treat sketches as short sequences of paramaterized strokes and thus train a recurrent sketch generator with greater capacity for longer sketches.
arXiv Detail & Related papers (2020-07-04T21:30:52Z) - CoSE: Compositional Stroke Embeddings [52.529172734044664]
We present a generative model for complex free-form structures such as stroke-based drawing tasks.
Our approach is suitable for interactive use cases such as auto-completing diagrams.
arXiv Detail & Related papers (2020-06-17T15:22:54Z) - Sketchformer: Transformer-based Representation for Sketched Structure [12.448155157592895]
Sketchformer is a transformer-based representation for encoding free-hand sketches input in a vector form.
We report several variants exploring continuous and tokenized input representations, and contrast their performance.
Our learned embedding, driven by a dictionary learning tokenization scheme, yields state of the art performance in classification and image retrieval tasks.
arXiv Detail & Related papers (2020-02-24T17:11:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.