Drag-A-Video: Non-rigid Video Editing with Point-based Interaction
- URL: http://arxiv.org/abs/2312.02936v1
- Date: Tue, 5 Dec 2023 18:05:59 GMT
- Title: Drag-A-Video: Non-rigid Video Editing with Point-based Interaction
- Authors: Yao Teng, Enze Xie, Yue Wu, Haoyu Han, Zhenguo Li and Xihui Liu
- Abstract summary: We propose a new diffusion-based method for interactive point-based video manipulation, called Drag-A-Video.
Our method allows users to click pairs of handle points and target points as well as masks on the first frame of an input video.
To precisely modify the contents of the video, we employ a new video-level motion supervision to update the features of the video.
- Score: 63.78538355189017
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video editing is a challenging task that requires manipulating videos on both
the spatial and temporal dimensions. Existing methods for video editing mainly
focus on changing the appearance or style of the objects in the video, while
keeping their structures unchanged. However, there is no existing method that
allows users to interactively ``drag'' any points of instances on the first
frame to precisely reach the target points with other frames consistently
deformed. In this paper, we propose a new diffusion-based method for
interactive point-based video manipulation, called Drag-A-Video. Our method
allows users to click pairs of handle points and target points as well as masks
on the first frame of an input video. Then, our method transforms the inputs
into point sets and propagates these sets across frames. To precisely modify
the contents of the video, we employ a new video-level motion supervision to
update the features of the video and introduce the latent offsets to achieve
this update at multiple denoising timesteps. We propose a temporal-consistent
point tracking module to coordinate the movement of the points in the handle
point sets. We demonstrate the effectiveness and flexibility of our method on
various videos. The website of our work is available here:
https://drag-a-video.github.io/.
Related papers
- InVi: Object Insertion In Videos Using Off-the-Shelf Diffusion Models [46.587906540660455]
We introduce InVi, an approach for inserting or replacing objects within videos using off-the-shelf, text-to-image latent diffusion models.
InVi achieves realistic object insertion with consistent blending and coherence across frames, outperforming existing methods.
arXiv Detail & Related papers (2024-07-15T17:55:09Z) - HOI-Swap: Swapping Objects in Videos with Hand-Object Interaction Awareness [57.18183962641015]
We present HOI-Swap, a video editing framework trained in a self-supervised manner.
The first stage focuses on object swapping in a single frame with HOI awareness.
The second stage extends the single-frame edit across the entire sequence.
arXiv Detail & Related papers (2024-06-11T22:31:29Z) - Temporally Consistent Object Editing in Videos using Extended Attention [9.605596668263173]
We propose a method to edit videos using a pre-trained inpainting image diffusion model.
We ensure that the edited information will be consistent across all the video frames.
arXiv Detail & Related papers (2024-06-01T02:31:16Z) - MagicStick: Controllable Video Editing via Control Handle
Transformations [109.26314726025097]
MagicStick is a controllable video editing method that edits the video properties by utilizing the transformation on the extracted internal control signals.
We present experiments on numerous examples within our unified framework.
We also compare with shape-aware text-based editing and handcrafted motion video generation, demonstrating our superior temporal consistency and editing capability than previous works.
arXiv Detail & Related papers (2023-12-05T17:58:06Z) - VideoSwap: Customized Video Subject Swapping with Interactive Semantic
Point Correspondence [37.85691662157054]
Video editing approaches that rely on dense correspondences are ineffective when the target edit involves a shape change.
We introduce the VideoSwap framework, inspired by our observation that only a small number of semantic points are necessary to align the subject's motion trajectory and modify its shape.
Extensive experiments demonstrate state-of-the-art video subject swapping results across a variety of real-world videos.
arXiv Detail & Related papers (2023-12-04T17:58:06Z) - Ground-A-Video: Zero-shot Grounded Video Editing using Text-to-image
Diffusion Models [65.268245109828]
Ground-A-Video is a video-to-video translation framework for multi-attribute video editing.
It attains temporally consistent editing of input videos in a training-free manner.
Experiments and applications demonstrate that Ground-A-Video's zero-shot capacity outperforms other baseline methods in terms of edit-accuracy and frame consistency.
arXiv Detail & Related papers (2023-10-02T11:28:37Z) - MagicProp: Diffusion-based Video Editing via Motion-aware Appearance
Propagation [74.32046206403177]
MagicProp disentangles the video editing process into two stages: appearance editing and motion-aware appearance propagation.
In the first stage, MagicProp selects a single frame from the input video and applies image-editing techniques to modify the content and/or style of the frame.
In the second stage, MagicProp employs the edited frame as an appearance reference and generates the remaining frames using an autoregressive rendering approach.
arXiv Detail & Related papers (2023-09-02T11:13:29Z) - Deformable Sprites for Unsupervised Video Decomposition [66.73136214980309]
We represent each scene element as a emphDeformable Sprite consisting of three components.
The resulting decomposition allows for applications such as consistent video editing.
arXiv Detail & Related papers (2022-04-14T17:58:02Z) - Layered Neural Atlases for Consistent Video Editing [37.69447642502351]
We present a method that decomposes, or "unwraps", an input video into a set of layered 2D atlases.
For each pixel in the video, our method estimates its corresponding 2D coordinate in each of the atlases.
We design our atlases to be interpretable and semantic, which facilitates easy and intuitive editing in the atlas domain.
arXiv Detail & Related papers (2021-09-23T14:58:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.