Hashing Neural Video Decomposition with Multiplicative Residuals in
Space-Time
- URL: http://arxiv.org/abs/2309.14022v1
- Date: Mon, 25 Sep 2023 10:36:14 GMT
- Title: Hashing Neural Video Decomposition with Multiplicative Residuals in
Space-Time
- Authors: Cheng-Hung Chan, Cheng-Yang Yuan, Cheng Sun, and Hwann-Tzong Chen
- Abstract summary: We present a video decomposition method that facilitates layer-based editing of videos withtemporally varying lighting effects.
Our method efficiently learns layer-based neural representations of a 1080p video in 25s per frame via coordinate hashing.
We propose to adopt evaluation metrics for objectively assessing the consistency of video editing.
- Score: 14.015909536844337
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a video decomposition method that facilitates layer-based editing
of videos with spatiotemporally varying lighting and motion effects. Our neural
model decomposes an input video into multiple layered representations, each
comprising a 2D texture map, a mask for the original video, and a
multiplicative residual characterizing the spatiotemporal variations in
lighting conditions. A single edit on the texture maps can be propagated to the
corresponding locations in the entire video frames while preserving other
contents' consistencies. Our method efficiently learns the layer-based neural
representations of a 1080p video in 25s per frame via coordinate hashing and
allows real-time rendering of the edited result at 71 fps on a single GPU.
Qualitatively, we run our method on various videos to show its effectiveness in
generating high-quality editing effects. Quantitatively, we propose to adopt
feature-tracking evaluation metrics for objectively assessing the consistency
of video editing. Project page: https://lightbulb12294.github.io/hashing-nvd/
Related papers
- Portrait Video Editing Empowered by Multimodal Generative Priors [39.747581584889495]
We introduce PortraitGen, a powerful portrait video editing method that achieves consistent and expressive stylization with multimodal prompts.
Our approach incorporates multimodal inputs through knowledge distilled from large-scale 2D generative models.
Our system also incorporates expression similarity guidance and a face-aware portrait editing module, effectively mitigating degradation issues associated with iterative dataset updates.
arXiv Detail & Related papers (2024-09-20T15:45:13Z) - I2VEdit: First-Frame-Guided Video Editing via Image-to-Video Diffusion Models [18.36472998650704]
We introduce a novel and generic solution that extends the applicability of image editing tools to videos by propagating edits from a single frame to the entire video using a pre-trained image-to-video model.
Our method, dubbed I2VEdit, adaptively preserves the visual and motion integrity of the source video depending on the extent of the edits.
arXiv Detail & Related papers (2024-05-26T11:47:40Z) - Neural Video Fields Editing [56.558490998753456]
NVEdit is a text-driven video editing framework designed to mitigate memory overhead and improve consistency.
We construct a neural video field, powered by tri-plane and sparse grid, to enable encoding long videos with hundreds of frames.
Next, we update the video field through off-the-shelf Text-to-Image (T2I) models to text-driven editing effects.
arXiv Detail & Related papers (2023-12-12T14:48:48Z) - Ground-A-Video: Zero-shot Grounded Video Editing using Text-to-image
Diffusion Models [65.268245109828]
Ground-A-Video is a video-to-video translation framework for multi-attribute video editing.
It attains temporally consistent editing of input videos in a training-free manner.
Experiments and applications demonstrate that Ground-A-Video's zero-shot capacity outperforms other baseline methods in terms of edit-accuracy and frame consistency.
arXiv Detail & Related papers (2023-10-02T11:28:37Z) - VidEdit: Zero-Shot and Spatially Aware Text-Driven Video Editing [18.24307442582304]
We introduce VidEdit, a novel method for zero-shot text-based video editing.
Our experiments show that VidEdit outperforms state-of-the-art methods on DAVIS dataset.
arXiv Detail & Related papers (2023-06-14T19:15:49Z) - Edit-A-Video: Single Video Editing with Object-Aware Consistency [49.43316939996227]
We propose a video editing framework given only a pretrained TTI model and a single text, video> pair, which we term Edit-A-Video.
The framework consists of two stages: (1) inflating the 2D model into the 3D model by appending temporal modules tuning and on the source video (2) inverting the source video into the noise and editing with target text prompt and attention map injection.
We present extensive experimental results over various types of text and videos, and demonstrate the superiority of the proposed method compared to baselines in terms of background consistency, text alignment, and video editing quality.
arXiv Detail & Related papers (2023-03-14T14:35:59Z) - Unsupervised Video Interpolation by Learning Multilayered 2.5D Motion
Fields [75.81417944207806]
This paper presents a self-supervised approach to video frame learning that requires only a single video.
We parameterize the video motions by solving an ordinary differentiable equation (ODE) defined on a time-varying motion field.
This implicit neural representation learns the video as a space-time continuum, allowing frame-time continuum at any temporal resolution.
arXiv Detail & Related papers (2022-04-21T06:17:05Z) - Layered Neural Atlases for Consistent Video Editing [37.69447642502351]
We present a method that decomposes, or "unwraps", an input video into a set of layered 2D atlases.
For each pixel in the video, our method estimates its corresponding 2D coordinate in each of the atlases.
We design our atlases to be interpretable and semantic, which facilitates easy and intuitive editing in the atlas domain.
arXiv Detail & Related papers (2021-09-23T14:58:59Z) - Attention-guided Temporal Coherent Video Object Matting [78.82835351423383]
We propose a novel deep learning-based object matting method that can achieve temporally coherent matting results.
Its key component is an attention-based temporal aggregation module that maximizes image matting networks' strength.
We show how to effectively solve the trimap generation problem by fine-tuning a state-of-the-art video object segmentation network.
arXiv Detail & Related papers (2021-05-24T17:34:57Z) - Layered Neural Rendering for Retiming People in Video [108.85428504808318]
We present a method for retiming people in an ordinary, natural video.
We can temporally align different motions, change the speed of certain actions, or "erase" selected people from the video altogether.
A key property of our model is that it not only disentangles the direct motions of each person in the input video, but also correlates each person automatically with the scene changes they generate.
arXiv Detail & Related papers (2020-09-16T17:48:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.