Related papers: RNA: Video Editing with ROI-based Neural Atlas

RNA: Video Editing with ROI-based Neural Atlas

URL: http://arxiv.org/abs/2410.07600v1
Date: Thu, 10 Oct 2024 04:17:19 GMT
Title: RNA: Video Editing with ROI-based Neural Atlas
Authors: Jaekyeong Lee, Geonung Kim, Sunghyun Cho,
Abstract summary: We propose a novel region-of-interest (ROI)-based video editing framework: ROI-based Neural Atlas (RNA) Unlike prior work, RNA allows users to specify editing regions, simplifying the editing process by removing the need for foreground separation. We introduce a soft neural atlas model for video reconstruction to ensure high-quality editing results.
Score: 14.848279912686946
License: http://creativecommons.org/licenses/by/4.0/
Abstract: With the recent growth of video-based Social Network Service (SNS) platforms, the demand for video editing among common users has increased. However, video editing can be challenging due to the temporally-varying factors such as camera movement and moving objects. While modern atlas-based video editing methods have addressed these issues, they often fail to edit videos including complex motion or multiple moving objects, and demand excessive computational cost, even for very simple edits. In this paper, we propose a novel region-of-interest (ROI)-based video editing framework: ROI-based Neural Atlas (RNA). Unlike prior work, RNA allows users to specify editing regions, simplifying the editing process by removing the need for foreground separation and atlas modeling for foreground objects. However, this simplification presents a unique challenge: acquiring a mask that effectively handles occlusions in the edited area caused by moving objects, without relying on an additional segmentation model. To tackle this, we propose a novel mask refinement approach designed for this specific challenge. Moreover, we introduce a soft neural atlas model for video reconstruction to ensure high-quality editing results. Extensive experiments show that RNA offers a more practical and efficient editing solution, applicable to a wider range of videos with superior quality compared to prior methods.

Related papers

Region-Constraint In-Context Generation for Instructional Video Editing [91.27224696009755]
We present ReCo, a new instructional video editing paradigm that delves into constraint modeling between editing and non-editing regions during in-context generation.<n>We propose a large-scale, high-quality video editing dataset, i.e., ReCo-Data, comprising 500K instruction-video pairs to benefit model training.
arXiv Detail & Related papers (2025-12-19T14:49:30Z)
EasyV2V: A High-quality Instruction-based Video Editing Framework [108.78294392167017]
captionemphEasyV2V is a framework for instruction-based video editing.<n>EasyV2V works with flexible inputs, e.g., video+text, video+mask+reference+, and state-of-the-art video editing results.
arXiv Detail & Related papers (2025-12-18T18:59:57Z)
Taming Flow-based I2V Models for Creative Video Editing [64.67801702413122]
Video editing, which aims to manipulate videos according to user intent, remains an emerging challenge.<n>Most existing image-conditioned video editing methods require inversion with model-specific design or need extensive optimization.<n>We propose IF-V2V, an Inversion-Free method that can adapt off-the-shelf flow-matching-based I2V models for video editing without significant computational overhead.
arXiv Detail & Related papers (2025-09-26T05:57:04Z)
O-DisCo-Edit: Object Distortion Control for Unified Realistic Video Editing [88.93410369258203]
O-DisCo-Edit is a unified framework that incorporates a novel object distortion control (O-DisCo)<n>This signal, based on random and adaptive noise, flexibly encapsulates a wide range of editing cues within a single representation.<n>O-DisCo-Edit enables efficient, high-fidelity editing through an effective training paradigm.
arXiv Detail & Related papers (2025-09-01T16:29:39Z)
TV-LiVE: Training-Free, Text-Guided Video Editing via Layer Informed Vitality Exploitation [36.81368812919819]
We present TV-LiVE, a Training-free and text-guided Video editing framework via Layerinformed Vitality Exploitation.<n>We empirically identify vital layers within the video generation model that significantly influence the quality of generated outputs.<n>For object addition, we identify prominent layers to extract the mask regions corresponding to the newly added target prompt.
arXiv Detail & Related papers (2025-06-08T16:12:13Z)
VEGGIE: Instructional Editing and Reasoning of Video Concepts with Grounded Generation [67.31149310468801]
We introduce VEGGIE, a simple end-to-end framework that unifies video concept editing, grounding, and reasoning based on diverse user instructions. VEGGIE shows strong performance in instructional video editing with different editing skills, outperforming the best instructional baseline as a versatile model.
arXiv Detail & Related papers (2025-03-18T15:31:12Z)
A Reinforcement Learning-Based Automatic Video Editing Method Using Pre-trained Vision-Language Model [10.736207095604414]
We propose a two-stage scheme for general editing. Firstly, unlike previous works that extract scene-specific features, we leverage the pre-trained Vision-Language Model (VLM) We also propose a Reinforcement Learning (RL)-based editing framework to formulate the editing problem and train the virtual editor to make better sequential editing decisions.
arXiv Detail & Related papers (2024-11-07T18:20:28Z)
Temporally Consistent Object Editing in Videos using Extended Attention [9.605596668263173]
We propose a method to edit videos using a pre-trained inpainting image diffusion model. We ensure that the edited information will be consistent across all the video frames.
arXiv Detail & Related papers (2024-06-01T02:31:16Z)
ReVideo: Remake a Video with Motion and Content Control [67.5923127902463]
We present a novel attempt to Remake a Video (VideoRe) which allows precise video editing in specific areas through the specification of both content and motion. VideoRe addresses a new task involving the coupling and training imbalance between content and motion control. Our method can also seamlessly extend these applications to multi-area editing without modifying specific training, demonstrating its flexibility and robustness.
arXiv Detail & Related papers (2024-05-22T17:46:08Z)
Action Reimagined: Text-to-Pose Video Editing for Dynamic Human Actions [49.14827857853878]
ReimaginedAct comprises video understanding, reasoning, and editing modules. Our method can accept not only direct instructional text prompts but also what if' questions to predict possible action changes.
arXiv Detail & Related papers (2024-03-11T22:46:46Z)
Neural Video Fields Editing [56.558490998753456]
NVEdit is a text-driven video editing framework designed to mitigate memory overhead and improve consistency. We construct a neural video field, powered by tri-plane and sparse grid, to enable encoding long videos with hundreds of frames. Next, we update the video field through off-the-shelf Text-to-Image (T2I) models to text-driven editing effects.
arXiv Detail & Related papers (2023-12-12T14:48:48Z)
Editing 3D Scenes via Text Prompts without Retraining [80.57814031701744]
DN2N is a text-driven editing method that allows for the direct acquisition of a NeRF model with universal editing capabilities. Our method employs off-the-shelf text-based editing models of 2D images to modify the 3D scene images. Our method achieves multiple editing types, including but not limited to appearance editing, weather transition, material changing, and style transfer.
arXiv Detail & Related papers (2023-09-10T02:31:50Z)
Seal-3D: Interactive Pixel-Level Editing for Neural Radiance Fields [14.803266838721864]
Seal-3D allows users to edit NeRF models in a pixel-level and free manner with a wide range of NeRF-like backbone and preview the editing effects instantly. A NeRF editing system is built to showcase various editing types.
arXiv Detail & Related papers (2023-07-27T18:08:19Z)
INVE: Interactive Neural Video Editing [79.48055669064229]
Interactive Neural Video Editing (INVE) is a real-time video editing solution that consistently propagates sparse frame edits to the entire video clip. Our method is inspired by the recent work on Layered Neural Atlas (LNA) LNA suffers from two major drawbacks: (1) the method is too slow for interactive editing, and (2) it offers insufficient support for some editing use cases.
arXiv Detail & Related papers (2023-07-15T00:02:41Z)
Diffusion Video Autoencoders: Toward Temporally Consistent Face Video Editing via Disentangled Video Encoding [35.18070525015657]
We propose a novel face video editing framework based on diffusion autoencoders. Our model is based on diffusion models and can satisfy both reconstruction and edit capabilities at the same time.
arXiv Detail & Related papers (2022-12-06T07:41:51Z)
The Anatomy of Video Editing: A Dataset and Benchmark Suite for AI-Assisted Video Editing [90.59584961661345]
This work introduces the Anatomy of Video Editing, a dataset, and benchmark, to foster research in AI-assisted video editing. Our benchmark suite focuses on video editing tasks, beyond visual effects, such as automatic footage organization and assisted video assembling. To enable research on these fronts, we annotate more than 1.5M tags, with relevant concepts to cinematography, from 196176 shots sampled from movie scenes.
arXiv Detail & Related papers (2022-07-20T10:53:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.