Layered Neural Atlases for Consistent Video Editing
- URL: http://arxiv.org/abs/2109.11418v1
- Date: Thu, 23 Sep 2021 14:58:59 GMT
- Title: Layered Neural Atlases for Consistent Video Editing
- Authors: Yoni Kasten, Dolev Ofri, Oliver Wang, Tali Dekel
- Abstract summary: We present a method that decomposes, or "unwraps", an input video into a set of layered 2D atlases.
For each pixel in the video, our method estimates its corresponding 2D coordinate in each of the atlases.
We design our atlases to be interpretable and semantic, which facilitates easy and intuitive editing in the atlas domain.
- Score: 37.69447642502351
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a method that decomposes, or "unwraps", an input video into a set
of layered 2D atlases, each providing a unified representation of the
appearance of an object (or background) over the video. For each pixel in the
video, our method estimates its corresponding 2D coordinate in each of the
atlases, giving us a consistent parameterization of the video, along with an
associated alpha (opacity) value. Importantly, we design our atlases to be
interpretable and semantic, which facilitates easy and intuitive editing in the
atlas domain, with minimal manual work required. Edits applied to a single 2D
atlas (or input video frame) are automatically and consistently mapped back to
the original video frames, while preserving occlusions, deformation, and other
complex scene effects such as shadows and reflections. Our method employs a
coordinate-based Multilayer Perceptron (MLP) representation for mappings,
atlases, and alphas, which are jointly optimized on a per-video basis, using a
combination of video reconstruction and regularization losses. By operating
purely in 2D, our method does not require any prior 3D knowledge about scene
geometry or camera poses, and can handle complex dynamic real world videos. We
demonstrate various video editing applications, including texture mapping,
video style transfer, image-to-video texture transfer, and
segmentation/labeling propagation, all automatically produced by editing a
single 2D atlas image.
Related papers
- Enhancing Temporal Consistency in Video Editing by Reconstructing Videos with 3D Gaussian Splatting [94.84688557937123]
Video-3DGS is a 3D Gaussian Splatting (3DGS)-based video refiner designed to enhance temporal consistency in zero-shot video editors.
Our approach utilizes a two-stage 3D Gaussian optimizing process tailored for editing dynamic monocular videos.
It enhances video editing by ensuring temporal consistency across 58 dynamic monocular videos.
arXiv Detail & Related papers (2024-06-04T17:57:37Z) - Ponymation: Learning Articulated 3D Animal Motions from Unlabeled Online Videos [47.97168047776216]
We introduce a new method for learning a generative model of articulated 3D animal motions from raw, unlabeled online videos.
Our model learns purely from a collection of unlabeled web video clips, leveraging semantic correspondences distilled from self-supervised image features.
arXiv Detail & Related papers (2023-12-21T06:44:18Z) - Drag-A-Video: Non-rigid Video Editing with Point-based Interaction [63.78538355189017]
We propose a new diffusion-based method for interactive point-based video manipulation, called Drag-A-Video.
Our method allows users to click pairs of handle points and target points as well as masks on the first frame of an input video.
To precisely modify the contents of the video, we employ a new video-level motion supervision to update the features of the video.
arXiv Detail & Related papers (2023-12-05T18:05:59Z) - WALDO: Future Video Synthesis using Object Layer Decomposition and
Parametric Flow Prediction [82.79642869586587]
WALDO is a novel approach to the prediction of future video frames from past ones.
Individual images are decomposed into multiple layers combining object masks and a small set of control points.
The layer structure is shared across all frames in each video to build dense inter-frame connections.
arXiv Detail & Related papers (2022-11-25T18:59:46Z) - Unsupervised Video Interpolation by Learning Multilayered 2.5D Motion
Fields [75.81417944207806]
This paper presents a self-supervised approach to video frame learning that requires only a single video.
We parameterize the video motions by solving an ordinary differentiable equation (ODE) defined on a time-varying motion field.
This implicit neural representation learns the video as a space-time continuum, allowing frame-time continuum at any temporal resolution.
arXiv Detail & Related papers (2022-04-21T06:17:05Z) - Video Autoencoder: self-supervised disentanglement of static 3D
structure and motion [60.58836145375273]
A video autoencoder is proposed for learning disentan- gled representations of 3D structure and camera pose from videos.
The representation can be applied to a range of tasks, including novel view synthesis, camera pose estimation, and video generation by motion following.
arXiv Detail & Related papers (2021-10-06T17:57:42Z) - Going beyond Free Viewpoint: Creating Animatable Volumetric Video of
Human Performances [7.7824496657259665]
We present an end-to-end pipeline for the creation of high-quality animatable volumetric video content of human performances.
Semantic enrichment and geometric animation ability are achieved by establishing temporal consistency in the 3D data.
For pose editing, we exploit the captured data as much as possible and kinematically deform the captured frames to fit a desired pose.
arXiv Detail & Related papers (2020-09-02T09:46:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.