VQ-SGen: A Vector Quantized Stroke Representation for Sketch Generation
- URL: http://arxiv.org/abs/2411.16446v1
- Date: Mon, 25 Nov 2024 14:51:22 GMT
- Title: VQ-SGen: A Vector Quantized Stroke Representation for Sketch Generation
- Authors: Jiawei Wang, Zhiming Cui, Changjian Li,
- Abstract summary: VQ-SGen is a novel algorithm for high-quality sketch generation.
By utilizing tokenized stroke representation, our approach generates strokes with high fidelity.
- Score: 12.486307321835909
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents VQ-SGen, a novel algorithm for high-quality sketch generation. Recent approaches have often framed the task as pixel-based generation either as a whole or part-by-part, neglecting the intrinsic and contextual relationships among individual strokes, such as the shape and spatial positioning of both proximal and distant strokes. To overcome these limitations, we propose treating each stroke within a sketch as an entity and introducing a vector-quantized (VQ) stroke representation for fine-grained sketch generation. Our method follows a two-stage framework - in the first stage, we decouple each stroke's shape and location information to ensure the VQ representation prioritizes stroke shape learning. In the second stage, we feed the precise and compact representation into an auto-decoding Transformer to incorporate stroke semantics, positions, and shapes into the generation process. By utilizing tokenized stroke representation, our approach generates strokes with high fidelity and facilitates novel applications, such as conditional generation and semantic-aware stroke editing. Comprehensive experiments demonstrate our method surpasses existing state-of-the-art techniques, underscoring its effectiveness. The code and model will be made publicly available upon publication.
Related papers
- StrokeFusion: Vector Sketch Generation via Joint Stroke-UDF Encoding and Latent Sequence Diffusion [13.862427684807486]
StrokeFusion is a two-stage framework for vector sketch generation.
It contains a dual-modal sketch feature learning network that maps strokes into a high-quality latent space.
It exploits a stroke-level latent diffusion model that simultaneously adjusts stroke position, scale, and trajectory during generation.
arXiv Detail & Related papers (2025-03-31T06:03:03Z) - SwiftSketch: A Diffusion Model for Image-to-Vector Sketch Generation [57.47730473674261]
We introduce SwiftSketch, a model for image-conditioned vector sketch generation that can produce high-quality sketches in less than a second.
SwiftSketch operates by progressively denoising stroke control points sampled from a Gaussian distribution.
ControlSketch is a method that enhances SDS-based techniques by incorporating precise spatial control through a depth-aware ControlNet.
arXiv Detail & Related papers (2025-02-12T18:57:12Z) - SketchYourSeg: Mask-Free Subjective Image Segmentation via Freehand Sketches [116.1810651297801]
SketchYourSeg establishes freehand sketches as a powerful query modality for subjective image segmentation.
Our evaluations demonstrate superior performance over existing approaches across diverse benchmarks.
arXiv Detail & Related papers (2025-01-27T13:07:51Z) - Text-to-Vector Generation with Neural Path Representation [27.949704002538944]
We propose a novel neural path representation that learns the path latent space from both sequence and image modalities.
In the first stage, a pre-trained text-to-image diffusion model guides the initial generation of complex vector graphics.
In the second stage, we refine the graphics using a layer-wise image vectorization strategy to achieve clearer elements and structure.
arXiv Detail & Related papers (2024-05-16T17:59:22Z) - Masked Generative Story Transformer with Character Guidance and Caption
Augmentation [2.1392064955842023]
Story visualization is a challenging generative vision task, that requires both visual quality and consistency between different frames in generated image sequences.
Previous approaches either employ some kind of memory mechanism to maintain context throughout an auto-regressive generation of the image sequence, or model the generation of the characters and their background separately.
We propose a completely parallel transformer-based approach, relying on Cross-Attention with past and future captions to achieve consistency.
arXiv Detail & Related papers (2024-03-13T13:10:20Z) - Sketch Video Synthesis [52.134906766625164]
We propose a novel framework for sketching videos represented by the frame-wise B'ezier curve.
Our method unlocks applications in sketch-based video editing and video doodling, enabled through video composition.
arXiv Detail & Related papers (2023-11-26T14:14:04Z) - SceneComposer: Any-Level Semantic Image Synthesis [80.55876413285587]
We propose a new framework for conditional image synthesis from semantic layouts of any precision levels.
The framework naturally reduces to text-to-image (T2I) at the lowest level with no shape information, and it becomes segmentation-to-image (S2I) at the highest level.
We introduce several novel techniques to address the challenges coming with this new setup.
arXiv Detail & Related papers (2022-11-21T18:59:05Z) - Boosting Modern and Historical Handwritten Text Recognition with
Deformable Convolutions [52.250269529057014]
Handwritten Text Recognition (HTR) in free-volution pages is a challenging image understanding task.
We propose to adopt deformable convolutions, which can deform depending on the input at hand and better adapt to the geometric variations of the text.
arXiv Detail & Related papers (2022-08-17T06:55:54Z) - SSR-GNNs: Stroke-based Sketch Representation with Graph Neural Networks [34.759306840182205]
This paper investigates a graph representation for sketches, where the information of strokes, i.e., parts of a sketch, are encoded on vertices and information of inter-stroke on edges.
The resultant graph representation facilitates the training of a Graph Neural Networks for classification tasks.
The proposed representation enables generation of novel sketches that are structurally similar to while separable from the existing dataset.
arXiv Detail & Related papers (2022-04-27T19:18:01Z) - Single-Stream Multi-Level Alignment for Vision-Language Pretraining [103.09776737512078]
We propose a single stream model that aligns the modalities at multiple levels.
We achieve this using two novel tasks: symmetric cross-modality reconstruction and a pseudo-labeled key word prediction.
We demonstrate top performance on a set of Vision-Language downstream tasks such as zero-shot/fine-tuned image/text retrieval, referring expression, and VQA.
arXiv Detail & Related papers (2022-03-27T21:16:10Z) - One Sketch for All: One-Shot Personalized Sketch Segmentation [84.45203849671003]
We present the first one-shot personalized sketch segmentation method.
We aim to segment all sketches belonging to the same category with a single sketch with a given part annotation.
We preserve the parts semantics embedded in the exemplar, and we are robust to input style and abstraction.
arXiv Detail & Related papers (2021-12-20T20:10:44Z) - ShapeEditer: a StyleGAN Encoder for Face Swapping [6.848723869850855]
We propose a novel encoder, called ShapeEditor, for high-resolution, realistic and high-fidelity face exchange.
Our key idea is to use an advanced pretrained high-quality random face image generator, i.e. StyleGAN, as backbone.
For learning to map into the latent space of StyleGAN, we propose a set of self-supervised loss functions.
arXiv Detail & Related papers (2021-06-26T09:38:45Z) - R2D2: Relational Text Decoding with Transformers [18.137828323277347]
We propose a novel framework for modeling the interaction between graphical structures and the natural language text associated with their nodes and edges.
Our proposed method utilizes both the graphical structure as well as the sequential nature of the texts.
While the proposed model has wide applications, we demonstrate its capabilities on data-to-text generation tasks.
arXiv Detail & Related papers (2021-05-10T19:59:11Z) - Towards Open-World Text-Guided Face Image Generation and Manipulation [52.83401421019309]
We propose a unified framework for both face image generation and manipulation.
Our method supports open-world scenarios, including both image and text, without any re-training, fine-tuning, or post-processing.
arXiv Detail & Related papers (2021-04-18T16:56:07Z) - B\'ezierSketch: A generative model for scalable vector sketches [132.5223191478268]
We present B'ezierSketch, a novel generative model for fully vector sketches that are automatically scalable and high-resolution.
We first introduce a novel inverse graphics approach to stroke embedding that trains an encoder to embed each stroke to its best fit B'ezier curve.
This enables us to treat sketches as short sequences of paramaterized strokes and thus train a recurrent sketch generator with greater capacity for longer sketches.
arXiv Detail & Related papers (2020-07-04T21:30:52Z) - CoSE: Compositional Stroke Embeddings [52.529172734044664]
We present a generative model for complex free-form structures such as stroke-based drawing tasks.
Our approach is suitable for interactive use cases such as auto-completing diagrams.
arXiv Detail & Related papers (2020-06-17T15:22:54Z) - SketchyCOCO: Image Generation from Freehand Scene Sketches [71.85577739612579]
We introduce the first method for automatic image generation from scene-level freehand sketches.
Key contribution is an attribute vector bridged Geneversarative Adrial Network called EdgeGAN.
We have built a large-scale composite dataset called SketchyCOCO to support and evaluate the solution.
arXiv Detail & Related papers (2020-03-05T14:54:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.