Related papers: Drag-A-Video: Non-rigid Video Editing with Point-based Interaction

Drag-A-Video: Non-rigid Video Editing with Point-based Interaction

URL: http://arxiv.org/abs/2312.02936v1
Date: Tue, 5 Dec 2023 18:05:59 GMT
Title: Drag-A-Video: Non-rigid Video Editing with Point-based Interaction
Authors: Yao Teng, Enze Xie, Yue Wu, Haoyu Han, Zhenguo Li and Xihui Liu
Abstract summary: We propose a new diffusion-based method for interactive point-based video manipulation, called Drag-A-Video. Our method allows users to click pairs of handle points and target points as well as masks on the first frame of an input video. To precisely modify the contents of the video, we employ a new video-level motion supervision to update the features of the video.
Score: 63.78538355189017
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Video editing is a challenging task that requires manipulating videos on both the spatial and temporal dimensions. Existing methods for video editing mainly focus on changing the appearance or style of the objects in the video, while keeping their structures unchanged. However, there is no existing method that allows users to interactively ``drag'' any points of instances on the first frame to precisely reach the target points with other frames consistently deformed. In this paper, we propose a new diffusion-based method for interactive point-based video manipulation, called Drag-A-Video. Our method allows users to click pairs of handle points and target points as well as masks on the first frame of an input video. Then, our method transforms the inputs into point sets and propagates these sets across frames. To precisely modify the contents of the video, we employ a new video-level motion supervision to update the features of the video and introduce the latent offsets to achieve this update at multiple denoising timesteps. We propose a temporal-consistent point tracking module to coordinate the movement of the points in the handle point sets. We demonstrate the effectiveness and flexibility of our method on various videos. The website of our work is available here: https://drag-a-video.github.io/.

Related papers

SketchVideo: Sketch-based Video Generation and Editing [51.99066098393491]
We aim to achieve sketch-based spatial and motion control for video generation and support fine-grained editing of real or synthetic videos. Based on the DiT video generation model, we propose a memory-efficient control structure with sketch control blocks that predict residual features of skipped DiT blocks. For sketch-based video editing, we design an additional video insertion module that maintains consistency between the newly edited content and the original video's spatial feature and dynamic motion.
arXiv Detail & Related papers (2025-03-30T02:44:09Z)
VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control [66.66226299852559]
VideoAnydoor is a zero-shot video object insertion framework with high-fidelity detail preservation and precise motion control. To preserve the detailed appearance and meanwhile support fine-grained motion control, we design a pixel warper.
arXiv Detail & Related papers (2025-01-02T18:59:54Z)
InVi: Object Insertion In Videos Using Off-the-Shelf Diffusion Models [46.587906540660455]
We introduce InVi, an approach for inserting or replacing objects within videos using off-the-shelf, text-to-image latent diffusion models. InVi achieves realistic object insertion with consistent blending and coherence across frames, outperforming existing methods.
arXiv Detail & Related papers (2024-07-15T17:55:09Z)
HOI-Swap: Swapping Objects in Videos with Hand-Object Interaction Awareness [57.18183962641015]
We present HOI-Swap, a video editing framework trained in a self-supervised manner. The first stage focuses on object swapping in a single frame with HOI awareness. The second stage extends the single-frame edit across the entire sequence.
arXiv Detail & Related papers (2024-06-11T22:31:29Z)
Temporally Consistent Object Editing in Videos using Extended Attention [9.605596668263173]
We propose a method to edit videos using a pre-trained inpainting image diffusion model. We ensure that the edited information will be consistent across all the video frames.
arXiv Detail & Related papers (2024-06-01T02:31:16Z)
MagicStick: Controllable Video Editing via Control Handle Transformations [49.29608051543133]
MagicStick is a controllable video editing method that edits the video properties by utilizing the transformation on the extracted internal control signals. We present experiments on numerous examples within our unified framework. We also compare with shape-aware text-based editing and handcrafted motion video generation, demonstrating our superior temporal consistency and editing capability than previous works.
arXiv Detail & Related papers (2023-12-05T17:58:06Z)
VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence [37.85691662157054]
Video editing approaches that rely on dense correspondences are ineffective when the target edit involves a shape change. We introduce the VideoSwap framework, inspired by our observation that only a small number of semantic points are necessary to align the subject's motion trajectory and modify its shape. Extensive experiments demonstrate state-of-the-art video subject swapping results across a variety of real-world videos.
arXiv Detail & Related papers (2023-12-04T17:58:06Z)
Ground-A-Video: Zero-shot Grounded Video Editing using Text-to-image Diffusion Models [65.268245109828]
Ground-A-Video is a video-to-video translation framework for multi-attribute video editing. It attains temporally consistent editing of input videos in a training-free manner. Experiments and applications demonstrate that Ground-A-Video's zero-shot capacity outperforms other baseline methods in terms of edit-accuracy and frame consistency.
arXiv Detail & Related papers (2023-10-02T11:28:37Z)
MagicProp: Diffusion-based Video Editing via Motion-aware Appearance Propagation [74.32046206403177]
MagicProp disentangles the video editing process into two stages: appearance editing and motion-aware appearance propagation. In the first stage, MagicProp selects a single frame from the input video and applies image-editing techniques to modify the content and/or style of the frame. In the second stage, MagicProp employs the edited frame as an appearance reference and generates the remaining frames using an autoregressive rendering approach.
arXiv Detail & Related papers (2023-09-02T11:13:29Z)
Deformable Sprites for Unsupervised Video Decomposition [66.73136214980309]
We represent each scene element as a emphDeformable Sprite consisting of three components. The resulting decomposition allows for applications such as consistent video editing.
arXiv Detail & Related papers (2022-04-14T17:58:02Z)
Layered Neural Atlases for Consistent Video Editing [37.69447642502351]
We present a method that decomposes, or "unwraps", an input video into a set of layered 2D atlases. For each pixel in the video, our method estimates its corresponding 2D coordinate in each of the atlases. We design our atlases to be interpretable and semantic, which facilitates easy and intuitive editing in the atlas domain.
arXiv Detail & Related papers (2021-09-23T14:58:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.