INVE: Interactive Neural Video Editing
- URL: http://arxiv.org/abs/2307.07663v1
- Date: Sat, 15 Jul 2023 00:02:41 GMT
- Title: INVE: Interactive Neural Video Editing
- Authors: Jiahui Huang, Leonid Sigal, Kwang Moo Yi, Oliver Wang, Joon-Young Lee
- Abstract summary: Interactive Neural Video Editing (INVE) is a real-time video editing solution that consistently propagates sparse frame edits to the entire video clip.
Our method is inspired by the recent work on Layered Neural Atlas (LNA)
LNA suffers from two major drawbacks: (1) the method is too slow for interactive editing, and (2) it offers insufficient support for some editing use cases.
- Score: 79.48055669064229
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present Interactive Neural Video Editing (INVE), a real-time video editing
solution, which can assist the video editing process by consistently
propagating sparse frame edits to the entire video clip. Our method is inspired
by the recent work on Layered Neural Atlas (LNA). LNA, however, suffers from
two major drawbacks: (1) the method is too slow for interactive editing, and
(2) it offers insufficient support for some editing use cases, including direct
frame editing and rigid texture tracking. To address these challenges we
leverage and adopt highly efficient network architectures, powered by
hash-grids encoding, to substantially improve processing speed. In addition, we
learn bi-directional functions between image-atlas and introduce vectorized
editing, which collectively enables a much greater variety of edits in both the
atlas and the frames directly. Compared to LNA, our INVE reduces the learning
and inference time by a factor of 5, and supports various video editing
operations that LNA cannot. We showcase the superiority of INVE over LNA in
interactive video editing through a comprehensive quantitative and qualitative
analysis, highlighting its numerous advantages and improved performance. For
video results, please see https://gabriel-huang.github.io/inve/
Related papers
- I2VEdit: First-Frame-Guided Video Editing via Image-to-Video Diffusion Models [18.36472998650704]
We introduce a novel and generic solution that extends the applicability of image editing tools to videos by propagating edits from a single frame to the entire video using a pre-trained image-to-video model.
Our method, dubbed I2VEdit, adaptively preserves the visual and motion integrity of the source video depending on the extent of the edits.
arXiv Detail & Related papers (2024-05-26T11:47:40Z) - ReVideo: Remake a Video with Motion and Content Control [67.5923127902463]
We present a novel attempt to Remake a Video (VideoRe) which allows precise video editing in specific areas through the specification of both content and motion.
VideoRe addresses a new task involving the coupling and training imbalance between content and motion control.
Our method can also seamlessly extend these applications to multi-area editing without modifying specific training, demonstrating its flexibility and robustness.
arXiv Detail & Related papers (2024-05-22T17:46:08Z) - Edit-Your-Motion: Space-Time Diffusion Decoupling Learning for Video Motion Editing [46.56615725175025]
We propose a one-shot video editing method called Edit-Your-Motion that requires only single text-video pair for training.
Specifically, we design Detailed PromptGuided Learning Strategy to decouple motion-temporal features in space-time diffusion models.
With Edit-Your-Motion, users can edit the motion of objects in the source video to generate more exciting and diverse videos.
arXiv Detail & Related papers (2024-05-07T17:06:59Z) - EffiVED:Efficient Video Editing via Text-instruction Diffusion Models [9.287394166165424]
We introduce EffiVED, an efficient diffusion-based model that supports instruction-guided video editing.
We transform vast image editing datasets and open-world videos into a high-quality dataset for training EffiVED.
arXiv Detail & Related papers (2024-03-18T08:42:08Z) - Neural Video Fields Editing [56.558490998753456]
NVEdit is a text-driven video editing framework designed to mitigate memory overhead and improve consistency.
We construct a neural video field, powered by tri-plane and sparse grid, to enable encoding long videos with hundreds of frames.
Next, we update the video field through off-the-shelf Text-to-Image (T2I) models to text-driven editing effects.
arXiv Detail & Related papers (2023-12-12T14:48:48Z) - VIDiff: Translating Videos via Multi-Modal Instructions with Diffusion
Models [96.55004961251889]
Video Instruction Diffusion (VIDiff) is a unified foundation model designed for a wide range of video tasks.
Our model can edit and translate the desired results within seconds based on user instructions.
We provide convincing generative results for diverse input videos and written instructions, both qualitatively and quantitatively.
arXiv Detail & Related papers (2023-11-30T18:59:52Z) - LOVECon: Text-driven Training-Free Long Video Editing with ControlNet [9.762680144118061]
This paper aims to bridge the gap, establishing a simple and effective baseline for training-free diffusion model-based long video editing.
We build the pipeline upon ControlNet, which excels at various image editing tasks based on text prompts.
Our method manages to edit videos comprising hundreds of frames according to user requirements.
arXiv Detail & Related papers (2023-10-15T02:39:25Z) - MagicEdit: High-Fidelity and Temporally Coherent Video Editing [70.55750617502696]
We present MagicEdit, a surprisingly simple yet effective solution to the text-guided video editing task.
We found that high-fidelity and temporally coherent video-to-video translation can be achieved by explicitly disentangling the learning of content, structure and motion signals during training.
arXiv Detail & Related papers (2023-08-28T17:56:22Z) - The Anatomy of Video Editing: A Dataset and Benchmark Suite for
AI-Assisted Video Editing [90.59584961661345]
This work introduces the Anatomy of Video Editing, a dataset, and benchmark, to foster research in AI-assisted video editing.
Our benchmark suite focuses on video editing tasks, beyond visual effects, such as automatic footage organization and assisted video assembling.
To enable research on these fronts, we annotate more than 1.5M tags, with relevant concepts to cinematography, from 196176 shots sampled from movie scenes.
arXiv Detail & Related papers (2022-07-20T10:53:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.