MotionEditor: Editing Video Motion via Content-Aware Diffusion
- URL: http://arxiv.org/abs/2311.18830v1
- Date: Thu, 30 Nov 2023 18:59:33 GMT
- Title: MotionEditor: Editing Video Motion via Content-Aware Diffusion
- Authors: Shuyuan Tu, Qi Dai, Zhi-Qi Cheng, Han Hu, Xintong Han, Zuxuan Wu,
Yu-Gang Jiang
- Abstract summary: MotionEditor is a diffusion model for video motion editing.
It incorporates a novel content-aware motion adapter into ControlNet to capture temporal motion correspondence.
- Score: 96.825431998349
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing diffusion-based video editing models have made gorgeous advances for
editing attributes of a source video over time but struggle to manipulate the
motion information while preserving the original protagonist's appearance and
background. To address this, we propose MotionEditor, a diffusion model for
video motion editing. MotionEditor incorporates a novel content-aware motion
adapter into ControlNet to capture temporal motion correspondence. While
ControlNet enables direct generation based on skeleton poses, it encounters
challenges when modifying the source motion in the inverted noise due to
contradictory signals between the noise (source) and the condition (reference).
Our adapter complements ControlNet by involving source content to transfer
adapted control signals seamlessly. Further, we build up a two-branch
architecture (a reconstruction branch and an editing branch) with a
high-fidelity attention injection mechanism facilitating branch interaction.
This mechanism enables the editing branch to query the key and value from the
reconstruction branch in a decoupled manner, making the editing branch retain
the original background and protagonist appearance. We also propose a skeleton
alignment algorithm to address the discrepancies in pose size and position.
Experiments demonstrate the promising motion editing ability of MotionEditor,
both qualitatively and quantitatively.
Related papers
- HOI-Swap: Swapping Objects in Videos with Hand-Object Interaction Awareness [57.18183962641015]
We present HOI-Swap, a video editing framework trained in a self-supervised manner.
The first stage focuses on object swapping in a single frame with HOI awareness.
The second stage extends the single-frame edit across the entire sequence.
arXiv Detail & Related papers (2024-06-11T22:31:29Z) - Temporally Consistent Object Editing in Videos using Extended Attention [9.605596668263173]
We propose a method to edit videos using a pre-trained inpainting image diffusion model.
We ensure that the edited information will be consistent across all the video frames.
arXiv Detail & Related papers (2024-06-01T02:31:16Z) - MotionFollower: Editing Video Motion via Lightweight Score-Guided Diffusion [94.66090422753126]
MotionFollower is a lightweight score-guided diffusion model for video motion editing.
It delivers superior motion editing performance and exclusively supports large camera movements and actions.
Compared with MotionEditor, the most advanced motion editing model, MotionFollower achieves an approximately 80% reduction in GPU memory.
arXiv Detail & Related papers (2024-05-30T17:57:30Z) - ReVideo: Remake a Video with Motion and Content Control [67.5923127902463]
We present a novel attempt to Remake a Video (VideoRe) which allows precise video editing in specific areas through the specification of both content and motion.
VideoRe addresses a new task involving the coupling and training imbalance between content and motion control.
Our method can also seamlessly extend these applications to multi-area editing without modifying specific training, demonstrating its flexibility and robustness.
arXiv Detail & Related papers (2024-05-22T17:46:08Z) - UniEdit: A Unified Tuning-Free Framework for Video Motion and Appearance Editing [28.140945021777878]
We present UniEdit, a tuning-free framework that supports both video motion and appearance editing.
To realize motion editing while preserving source video content, we introduce auxiliary motion-reference and reconstruction branches.
The obtained features are then injected into the main editing path via temporal and spatial self-attention layers.
arXiv Detail & Related papers (2024-02-20T17:52:12Z) - MagicStick: Controllable Video Editing via Control Handle Transformations [49.29608051543133]
MagicStick is a controllable video editing method that edits the video properties by utilizing the transformation on the extracted internal control signals.
We present experiments on numerous examples within our unified framework.
We also compare with shape-aware text-based editing and handcrafted motion video generation, demonstrating our superior temporal consistency and editing capability than previous works.
arXiv Detail & Related papers (2023-12-05T17:58:06Z) - MagicProp: Diffusion-based Video Editing via Motion-aware Appearance
Propagation [74.32046206403177]
MagicProp disentangles the video editing process into two stages: appearance editing and motion-aware appearance propagation.
In the first stage, MagicProp selects a single frame from the input video and applies image-editing techniques to modify the content and/or style of the frame.
In the second stage, MagicProp employs the edited frame as an appearance reference and generates the remaining frames using an autoregressive rendering approach.
arXiv Detail & Related papers (2023-09-02T11:13:29Z) - InFusion: Inject and Attention Fusion for Multi Concept Zero-Shot
Text-based Video Editing [27.661609140918916]
InFusion is a framework for zero-shot text-based video editing.
It supports editing of multiple concepts with pixel-level control over diverse concepts mentioned in the editing prompt.
Our framework is a low-cost alternative to one-shot tuned models for editing since it does not require training.
arXiv Detail & Related papers (2023-07-22T17:05:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.