Related papers: INVE: Interactive Neural Video Editing

INVE: Interactive Neural Video Editing

URL: http://arxiv.org/abs/2307.07663v1
Date: Sat, 15 Jul 2023 00:02:41 GMT
Title: INVE: Interactive Neural Video Editing
Authors: Jiahui Huang, Leonid Sigal, Kwang Moo Yi, Oliver Wang, Joon-Young Lee
Abstract summary: Interactive Neural Video Editing (INVE) is a real-time video editing solution that consistently propagates sparse frame edits to the entire video clip. Our method is inspired by the recent work on Layered Neural Atlas (LNA) LNA suffers from two major drawbacks: (1) the method is too slow for interactive editing, and (2) it offers insufficient support for some editing use cases.
Score: 79.48055669064229
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present Interactive Neural Video Editing (INVE), a real-time video editing solution, which can assist the video editing process by consistently propagating sparse frame edits to the entire video clip. Our method is inspired by the recent work on Layered Neural Atlas (LNA). LNA, however, suffers from two major drawbacks: (1) the method is too slow for interactive editing, and (2) it offers insufficient support for some editing use cases, including direct frame editing and rigid texture tracking. To address these challenges we leverage and adopt highly efficient network architectures, powered by hash-grids encoding, to substantially improve processing speed. In addition, we learn bi-directional functions between image-atlas and introduce vectorized editing, which collectively enables a much greater variety of edits in both the atlas and the frames directly. Compared to LNA, our INVE reduces the learning and inference time by a factor of 5, and supports various video editing operations that LNA cannot. We showcase the superiority of INVE over LNA in interactive video editing through a comprehensive quantitative and qualitative analysis, highlighting its numerous advantages and improved performance. For video results, please see https://gabriel-huang.github.io/inve/

Related papers

FireEdit: Fine-grained Instruction-based Image Editing via Region-aware Vision Language Model [54.693572837423226]
FireEdit is an innovative Fine-grained Instruction-based image editing framework that exploits a REgion-aware VLM. FireEdit is designed to accurately comprehend user instructions and ensure effective control over the editing process. Our approach surpasses the state-of-the-art instruction-based image editing methods.
arXiv Detail & Related papers (2025-03-25T16:59:42Z)
DIVE: Taming DINO for Subject-Driven Video Editing [49.090071984272576]
DINO-guided Video Editing (DIVE) is a framework designed to facilitate subject-driven editing in source videos. DIVE employs DINO features to align with the motion trajectory of the source video. For precise subject editing, DIVE incorporates the DINO features of reference images into a pretrained text-to-image model.
arXiv Detail & Related papers (2024-12-04T14:28:43Z)
RNA: Video Editing with ROI-based Neural Atlas [14.848279912686946]
We propose a novel region-of-interest (ROI)-based video editing framework: ROI-based Neural Atlas (RNA) Unlike prior work, RNA allows users to specify editing regions, simplifying the editing process by removing the need for foreground separation. We introduce a soft neural atlas model for video reconstruction to ensure high-quality editing results.
arXiv Detail & Related papers (2024-10-10T04:17:19Z)
I2VEdit: First-Frame-Guided Video Editing via Image-to-Video Diffusion Models [18.36472998650704]
We introduce a novel and generic solution that extends the applicability of image editing tools to videos by propagating edits from a single frame to the entire video using a pre-trained image-to-video model. Our method, dubbed I2VEdit, adaptively preserves the visual and motion integrity of the source video depending on the extent of the edits.
arXiv Detail & Related papers (2024-05-26T11:47:40Z)
ReVideo: Remake a Video with Motion and Content Control [67.5923127902463]
We present a novel attempt to Remake a Video (VideoRe) which allows precise video editing in specific areas through the specification of both content and motion. VideoRe addresses a new task involving the coupling and training imbalance between content and motion control. Our method can also seamlessly extend these applications to multi-area editing without modifying specific training, demonstrating its flexibility and robustness.
arXiv Detail & Related papers (2024-05-22T17:46:08Z)
EffiVED:Efficient Video Editing via Text-instruction Diffusion Models [9.287394166165424]
We introduce EffiVED, an efficient diffusion-based model that supports instruction-guided video editing. We transform vast image editing datasets and open-world videos into a high-quality dataset for training EffiVED.
arXiv Detail & Related papers (2024-03-18T08:42:08Z)
Object-Centric Diffusion for Efficient Video Editing [64.71639719352636]
Diffusion-based video editing has reached impressive quality. Such solutions typically incur heavy memory and computational costs to generate temporally-coherent frames. We propose modifications that allow significant speed-ups whilst maintaining quality.
arXiv Detail & Related papers (2024-01-11T08:36:15Z)
Neural Video Fields Editing [56.558490998753456]
NVEdit is a text-driven video editing framework designed to mitigate memory overhead and improve consistency. We construct a neural video field, powered by tri-plane and sparse grid, to enable encoding long videos with hundreds of frames. Next, we update the video field through off-the-shelf Text-to-Image (T2I) models to text-driven editing effects.
arXiv Detail & Related papers (2023-12-12T14:48:48Z)
VIDiff: Translating Videos via Multi-Modal Instructions with Diffusion Models [96.55004961251889]
Video Instruction Diffusion (VIDiff) is a unified foundation model designed for a wide range of video tasks. Our model can edit and translate the desired results within seconds based on user instructions. We provide convincing generative results for diverse input videos and written instructions, both qualitatively and quantitatively.
arXiv Detail & Related papers (2023-11-30T18:59:52Z)
LOVECon: Text-driven Training-Free Long Video Editing with ControlNet [9.762680144118061]
This paper aims to bridge the gap, establishing a simple and effective baseline for training-free diffusion model-based long video editing. We build the pipeline upon ControlNet, which excels at various image editing tasks based on text prompts. Our method manages to edit videos comprising hundreds of frames according to user requirements.
arXiv Detail & Related papers (2023-10-15T02:39:25Z)
The Anatomy of Video Editing: A Dataset and Benchmark Suite for AI-Assisted Video Editing [90.59584961661345]
This work introduces the Anatomy of Video Editing, a dataset, and benchmark, to foster research in AI-assisted video editing. Our benchmark suite focuses on video editing tasks, beyond visual effects, such as automatic footage organization and assisted video assembling. To enable research on these fronts, we annotate more than 1.5M tags, with relevant concepts to cinematography, from 196176 shots sampled from movie scenes.
arXiv Detail & Related papers (2022-07-20T10:53:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.