RNA: Video Editing with ROI-based Neural Atlas
- URL: http://arxiv.org/abs/2410.07600v1
- Date: Thu, 10 Oct 2024 04:17:19 GMT
- Title: RNA: Video Editing with ROI-based Neural Atlas
- Authors: Jaekyeong Lee, Geonung Kim, Sunghyun Cho,
- Abstract summary: We propose a novel region-of-interest (ROI)-based video editing framework: ROI-based Neural Atlas (RNA)
Unlike prior work, RNA allows users to specify editing regions, simplifying the editing process by removing the need for foreground separation.
We introduce a soft neural atlas model for video reconstruction to ensure high-quality editing results.
- Score: 14.848279912686946
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the recent growth of video-based Social Network Service (SNS) platforms, the demand for video editing among common users has increased. However, video editing can be challenging due to the temporally-varying factors such as camera movement and moving objects. While modern atlas-based video editing methods have addressed these issues, they often fail to edit videos including complex motion or multiple moving objects, and demand excessive computational cost, even for very simple edits. In this paper, we propose a novel region-of-interest (ROI)-based video editing framework: ROI-based Neural Atlas (RNA). Unlike prior work, RNA allows users to specify editing regions, simplifying the editing process by removing the need for foreground separation and atlas modeling for foreground objects. However, this simplification presents a unique challenge: acquiring a mask that effectively handles occlusions in the edited area caused by moving objects, without relying on an additional segmentation model. To tackle this, we propose a novel mask refinement approach designed for this specific challenge. Moreover, we introduce a soft neural atlas model for video reconstruction to ensure high-quality editing results. Extensive experiments show that RNA offers a more practical and efficient editing solution, applicable to a wider range of videos with superior quality compared to prior methods.
Related papers
- A Reinforcement Learning-Based Automatic Video Editing Method Using Pre-trained Vision-Language Model [10.736207095604414]
We propose a two-stage scheme for general editing. Firstly, unlike previous works that extract scene-specific features, we leverage the pre-trained Vision-Language Model (VLM)
We also propose a Reinforcement Learning (RL)-based editing framework to formulate the editing problem and train the virtual editor to make better sequential editing decisions.
arXiv Detail & Related papers (2024-11-07T18:20:28Z) - Temporally Consistent Object Editing in Videos using Extended Attention [9.605596668263173]
We propose a method to edit videos using a pre-trained inpainting image diffusion model.
We ensure that the edited information will be consistent across all the video frames.
arXiv Detail & Related papers (2024-06-01T02:31:16Z) - ReVideo: Remake a Video with Motion and Content Control [67.5923127902463]
We present a novel attempt to Remake a Video (VideoRe) which allows precise video editing in specific areas through the specification of both content and motion.
VideoRe addresses a new task involving the coupling and training imbalance between content and motion control.
Our method can also seamlessly extend these applications to multi-area editing without modifying specific training, demonstrating its flexibility and robustness.
arXiv Detail & Related papers (2024-05-22T17:46:08Z) - Action Reimagined: Text-to-Pose Video Editing for Dynamic Human Actions [49.14827857853878]
ReimaginedAct comprises video understanding, reasoning, and editing modules.
Our method can accept not only direct instructional text prompts but also what if' questions to predict possible action changes.
arXiv Detail & Related papers (2024-03-11T22:46:46Z) - Neural Video Fields Editing [56.558490998753456]
NVEdit is a text-driven video editing framework designed to mitigate memory overhead and improve consistency.
We construct a neural video field, powered by tri-plane and sparse grid, to enable encoding long videos with hundreds of frames.
Next, we update the video field through off-the-shelf Text-to-Image (T2I) models to text-driven editing effects.
arXiv Detail & Related papers (2023-12-12T14:48:48Z) - Editing 3D Scenes via Text Prompts without Retraining [80.57814031701744]
DN2N is a text-driven editing method that allows for the direct acquisition of a NeRF model with universal editing capabilities.
Our method employs off-the-shelf text-based editing models of 2D images to modify the 3D scene images.
Our method achieves multiple editing types, including but not limited to appearance editing, weather transition, material changing, and style transfer.
arXiv Detail & Related papers (2023-09-10T02:31:50Z) - Seal-3D: Interactive Pixel-Level Editing for Neural Radiance Fields [14.803266838721864]
Seal-3D allows users to edit NeRF models in a pixel-level and free manner with a wide range of NeRF-like backbone and preview the editing effects instantly.
A NeRF editing system is built to showcase various editing types.
arXiv Detail & Related papers (2023-07-27T18:08:19Z) - INVE: Interactive Neural Video Editing [79.48055669064229]
Interactive Neural Video Editing (INVE) is a real-time video editing solution that consistently propagates sparse frame edits to the entire video clip.
Our method is inspired by the recent work on Layered Neural Atlas (LNA)
LNA suffers from two major drawbacks: (1) the method is too slow for interactive editing, and (2) it offers insufficient support for some editing use cases.
arXiv Detail & Related papers (2023-07-15T00:02:41Z) - Diffusion Video Autoencoders: Toward Temporally Consistent Face Video
Editing via Disentangled Video Encoding [35.18070525015657]
We propose a novel face video editing framework based on diffusion autoencoders.
Our model is based on diffusion models and can satisfy both reconstruction and edit capabilities at the same time.
arXiv Detail & Related papers (2022-12-06T07:41:51Z) - The Anatomy of Video Editing: A Dataset and Benchmark Suite for
AI-Assisted Video Editing [90.59584961661345]
This work introduces the Anatomy of Video Editing, a dataset, and benchmark, to foster research in AI-assisted video editing.
Our benchmark suite focuses on video editing tasks, beyond visual effects, such as automatic footage organization and assisted video assembling.
To enable research on these fronts, we annotate more than 1.5M tags, with relevant concepts to cinematography, from 196176 shots sampled from movie scenes.
arXiv Detail & Related papers (2022-07-20T10:53:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.