VideoSketcher: Video Models Prior Enable Versatile Sequential Sketch Generation
- URL: http://arxiv.org/abs/2602.15819v1
- Date: Tue, 17 Feb 2026 18:55:03 GMT
- Title: VideoSketcher: Video Models Prior Enable Versatile Sequential Sketch Generation
- Authors: Hui Ren, Yuval Alaluf, Omer Bar Tal, Alexander Schwing, Antonio Torralba, Yael Vinker,
- Abstract summary: Most generative models treat sketches as static images, overlooking the temporal structure that underlies creative drawing.<n>We present a data-efficient approach for sequential sketch generation that adapts pretrained text-to-video diffusion models.<n>Our method generates high-quality sketches that closely follow text-specified orderings while exhibiting rich visual detail.
- Score: 73.23035143627598
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sketching is inherently a sequential process, in which strokes are drawn in a meaningful order to explore and refine ideas. However, most generative models treat sketches as static images, overlooking the temporal structure that underlies creative drawing. We present a data-efficient approach for sequential sketch generation that adapts pretrained text-to-video diffusion models to generate sketching processes. Our key insight is that large language models and video diffusion models offer complementary strengths for this task: LLMs provide semantic planning and stroke ordering, while video diffusion models serve as strong renderers that produce high-quality, temporally coherent visuals. We leverage this by representing sketches as short videos in which strokes are progressively drawn on a blank canvas, guided by text-specified ordering instructions. We introduce a two-stage fine-tuning strategy that decouples the learning of stroke ordering from the learning of sketch appearance. Stroke ordering is learned using synthetic shape compositions with controlled temporal structure, while visual appearance is distilled from as few as seven manually authored sketching processes that capture both global drawing order and the continuous formation of individual strokes. Despite the extremely limited amount of human-drawn sketch data, our method generates high-quality sequential sketches that closely follow text-specified orderings while exhibiting rich visual detail. We further demonstrate the flexibility of our approach through extensions such as brush style conditioning and autoregressive sketch generation, enabling additional controllability and interactive, collaborative drawing.
Related papers
- Loomis Painter: Reconstructing the Painting Process [56.713812157283805]
Step-by-step painting tutorials are vital for learning artistic techniques, but existing video resources lack interactivity and personalization.<n>We propose a unified framework for multi-media painting process generation with a semantics-driven style control mechanism.<n>We also build a large-scale dataset of real painting processes and evaluate cross-media consistency, temporal coherence, and final-image fidelity.
arXiv Detail & Related papers (2025-11-21T16:06:32Z) - CoProSketch: Controllable and Progressive Sketch Generation with Diffusion Model [18.5540421907361]
Sketches serve as fundamental blueprints in artistic creation because sketch editing is easier and more intuitive than pixel-level RGB image editing for painting artists.<n>We propose a novel framework CoProSketch, providing prominent controllability and details for sketch generation with diffusion models.<n> Experiments demonstrate superior semantic consistency and controllability over baselines, offering a practical solution for integrating user feedback into generative models.
arXiv Detail & Related papers (2025-04-11T05:11:17Z) - SwiftSketch: A Diffusion Model for Image-to-Vector Sketch Generation [57.47730473674261]
We introduce SwiftSketch, a model for image-conditioned vector sketch generation that can produce high-quality sketches in less than a second.<n>SwiftSketch operates by progressively denoising stroke control points sampled from a Gaussian distribution.<n>ControlSketch is a method that enhances SDS-based techniques by incorporating precise spatial control through a depth-aware ControlNet.
arXiv Detail & Related papers (2025-02-12T18:57:12Z) - VidSketch: Hand-drawn Sketch-Driven Video Generation with Diffusion Control [13.320911720001277]
VidSketch is a method capable of generating high-quality video animations directly from any number of hand-drawn sketches and simple text prompts.<n>Specifically, our method introduces a Level-Based Sketch Control Strategy to automatically the guidance strength of sketches adjust the generation process.<n>A TempSpatial Attention mechanism is designed to enhance more consistency of generated video animations.
arXiv Detail & Related papers (2025-02-03T06:45:00Z) - VIRES: Video Instance Repainting via Sketch and Text Guided Generation [46.4323117976194]
VIRES is a video instance repainting method with sketch and text guidance.<n>We propose the Sequential ControlNet with the standardized self-scaling.<n>A sketch-aware encoder ensures that repainted results are aligned with the provided sketch sequence.
arXiv Detail & Related papers (2024-11-25T08:55:41Z) - Sketch Video Synthesis [52.134906766625164]
We propose a novel framework for sketching videos represented by the frame-wise B'ezier curve.
Our method unlocks applications in sketch-based video editing and video doodling, enabled through video composition.
arXiv Detail & Related papers (2023-11-26T14:14:04Z) - SketchDreamer: Interactive Text-Augmented Creative Sketch Ideation [111.2195741547517]
We present a method to generate controlled sketches using a text-conditioned diffusion model trained on pixel representations of images.
Our objective is to empower non-professional users to create sketches and, through a series of optimisation processes, transform a narrative into a storyboard.
arXiv Detail & Related papers (2023-08-27T19:44:44Z) - Deep Plastic Surgery: Robust and Controllable Image Editing with
Human-Drawn Sketches [133.01690754567252]
Sketch-based image editing aims to synthesize and modify photos based on the structural information provided by the human-drawn sketches.
Deep Plastic Surgery is a novel, robust and controllable image editing framework that allows users to interactively edit images using hand-drawn sketch inputs.
arXiv Detail & Related papers (2020-01-09T08:57:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.