Related papers: VIRES: Video Instance Repainting with Sketch and Text Guidance

VIRES: Video Instance Repainting with Sketch and Text Guidance

URL: http://arxiv.org/abs/2411.16199v2
Date: Tue, 26 Nov 2024 11:43:01 GMT
Title: VIRES: Video Instance Repainting with Sketch and Text Guidance
Authors: Shuchen Weng, Haojie Zheng, Peixuan Zhan, Yuchen Hong, Han Jiang, Si Li, Boxin Shi,
Abstract summary: We introduce VIRES, a video instance repainting method with sketch and text guidance. Existing approaches struggle with temporal consistency and accurate alignment with the provided sketch sequence. We propose the Sequential ControlNet with the standardized self-scaling. A sketch-aware encoder ensures that repainted results are aligned with the provided sketch sequence.
Score: 46.24384664227624
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce VIRES, a video instance repainting method with sketch and text guidance, enabling video instance repainting, replacement, generation, and removal. Existing approaches struggle with temporal consistency and accurate alignment with the provided sketch sequence. VIRES leverages the generative priors of text-to-video models to maintain temporal consistency and produce visually pleasing results. We propose the Sequential ControlNet with the standardized self-scaling, which effectively extracts structure layouts and adaptively captures high-contrast sketch details. We further augment the diffusion transformer backbone with the sketch attention to interpret and inject fine-grained sketch semantics. A sketch-aware encoder ensures that repainted results are aligned with the provided sketch sequence. Additionally, we contribute the VireSet, a dataset with detailed annotations tailored for training and evaluating video instance editing methods. Experimental results demonstrate the effectiveness of VIRES, which outperforms state-of-the-art methods in visual quality, temporal consistency, condition alignment, and human ratings. Project page:https://suimuc.github.io/suimu.github.io/projects/VIRES/

Related papers

CoProSketch: Controllable and Progressive Sketch Generation with Diffusion Model [18.5540421907361]
Sketches serve as fundamental blueprints in artistic creation because sketch editing is easier and more intuitive than pixel-level RGB image editing for painting artists. We propose a novel framework CoProSketch, providing prominent controllability and details for sketch generation with diffusion models. Experiments demonstrate superior semantic consistency and controllability over baselines, offering a practical solution for integrating user feedback into generative models.
arXiv Detail & Related papers (2025-04-11T05:11:17Z)
SketchVideo: Sketch-based Video Generation and Editing [51.99066098393491]
We aim to achieve sketch-based spatial and motion control for video generation and support fine-grained editing of real or synthetic videos. Based on the DiT video generation model, we propose a memory-efficient control structure with sketch control blocks that predict residual features of skipped DiT blocks. For sketch-based video editing, we design an additional video insertion module that maintains consistency between the newly edited content and the original video's spatial feature and dynamic motion.
arXiv Detail & Related papers (2025-03-30T02:44:09Z)
SwiftSketch: A Diffusion Model for Image-to-Vector Sketch Generation [57.47730473674261]
We introduce SwiftSketch, a model for image-conditioned vector sketch generation that can produce high-quality sketches in less than a second. SwiftSketch operates by progressively denoising stroke control points sampled from a Gaussian distribution. ControlSketch is a method that enhances SDS-based techniques by incorporating precise spatial control through a depth-aware ControlNet.
arXiv Detail & Related papers (2025-02-12T18:57:12Z)
VidSketch: Hand-drawn Sketch-Driven Video Generation with Diffusion Control [13.320911720001277]
VidSketch is a method capable of generating high-quality video animations directly from any number of hand-drawn sketches and simple text prompts. Specifically, our method introduces a Level-Based Sketch Control Strategy to automatically the guidance strength of sketches adjust the generation process. A TempSpatial Attention mechanism is designed to enhance more consistency of generated video animations.
arXiv Detail & Related papers (2025-02-03T06:45:00Z)
FramePainter: Endowing Interactive Image Editing with Video Diffusion Priors [64.54220123913154]
We introduce FramePainter as an efficient instantiation of image-to-video generation problem. It only uses a lightweight sparse control encoder to inject editing signals. It domainantly outperforms previous state-of-the-art methods with far less training data.
arXiv Detail & Related papers (2025-01-14T16:09:16Z)
Enhancing Sketch Animation: Text-to-Video Diffusion Models with Temporal Consistency and Rigidity Constraints [1.1510009152620668]
We propose an approach for animating a given input sketch based on a descriptive text prompt. We leverage a pre-trained text-to-video diffusion model with SDS loss to guide the motion of the sketch's strokes. Our method surpasses state-of-the-art performance in both quantitative and qualitative evaluations.
arXiv Detail & Related papers (2024-11-28T21:15:38Z)
SketchTriplet: Self-Supervised Scenarized Sketch-Text-Image Triplet Generation [6.39528707908268]
There continues to be a lack of large-scale paired datasets for scene sketches. We propose a self-supervised method for scene sketch generation that does not rely on any existing scene sketch. We contribute a large-scale dataset centered around scene sketches, comprising highly semantically consistent "text-sketch-image" triplets.
arXiv Detail & Related papers (2024-05-29T06:43:49Z)
Sketch Video Synthesis [52.134906766625164]
We propose a novel framework for sketching videos represented by the frame-wise B'ezier curve. Our method unlocks applications in sketch-based video editing and video doodling, enabled through video composition.
arXiv Detail & Related papers (2023-11-26T14:14:04Z)
Breathing Life Into Sketches Using Text-to-Video Priors [101.8236605955899]
A sketch is one of the most intuitive and versatile tools humans use to convey their ideas visually. In this work, we present a method that automatically adds motion to a single-subject sketch. The output is a short animation provided in vector representation, which can be easily edited.
arXiv Detail & Related papers (2023-11-21T18:09:30Z)
WAIT: Feature Warping for Animation to Illustration video Translation using GANs [12.681919619814419]
We introduce a new problem for video stylizing where an unordered set of images are used. Most of the video-to-video translation methods are built on an image-to-image translation model. We propose a new generator network with feature warping layers which overcomes the limitations of the previous methods.
arXiv Detail & Related papers (2023-10-07T19:45:24Z)
FateZero: Fusing Attentions for Zero-shot Text-based Video Editing [104.27329655124299]
We propose FateZero, a zero-shot text-based editing method on real-world videos without per-prompt training or use-specific mask. Our method is the first one to show the ability of zero-shot text-driven video style and local attribute editing from the trained text-to-image model.
arXiv Detail & Related papers (2023-03-16T17:51:13Z)
Abstracting Sketches through Simple Primitives [53.04827416243121]
Humans show high-level of abstraction capabilities in games that require quickly communicating object information. We propose the Primitive-based Sketch Abstraction task where the goal is to represent sketches using a fixed set of drawing primitives. Our Primitive-Matching Network (PMN), learns interpretable abstractions of a sketch in a self supervised manner.
arXiv Detail & Related papers (2022-07-27T14:32:39Z)
Sketch Me A Video [32.38205496481408]
We introduce a new video synthesis task by employing two rough bad-drwan sketches only as input to create a realistic portrait video. A two-stage Sketch-to-Video model is proposed, which consists of two key novelties.
arXiv Detail & Related papers (2021-10-10T05:40:11Z)
Deep Plastic Surgery: Robust and Controllable Image Editing with Human-Drawn Sketches [133.01690754567252]
Sketch-based image editing aims to synthesize and modify photos based on the structural information provided by the human-drawn sketches. Deep Plastic Surgery is a novel, robust and controllable image editing framework that allows users to interactively edit images using hand-drawn sketch inputs.
arXiv Detail & Related papers (2020-01-09T08:57:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.