DuctTake: Spatiotemporal Video Compositing
- URL: http://arxiv.org/abs/2101.04772v1
- Date: Tue, 12 Jan 2021 21:58:47 GMT
- Title: DuctTake: Spatiotemporal Video Compositing
- Authors: Jan Rueegg, Oliver Wang, Aljoscha Smolic, Markus Gross
- Abstract summary: Our method instead composites shots together by finding optimal detail using motion-compensated cuts.
We validate our approach by presenting a wide variety of examples and by comparing quality and creation time to composites made by professional artists.
- Score: 28.154654576394112
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: DuctTake is a system designed to enable practical compositing of multiple
takes of a scene into a single video. Current industry solutions are based
around object segmentation, a hard problem that requires extensive manual input
and cleanup, making compositing an expensive part of the film-making process.
Our method instead composites shots together by finding optimal spatiotemporal
seams using motion-compensated 3D graph cuts through the video volume. We
describe in detail the required components, decisions, and new techniques that
together make a usable, interactive tool for compositing HD video, paying
special attention to running time and performance of each section. We validate
our approach by presenting a wide variety of examples and by comparing result
quality and creation time to composites made by professional artists using
current state-of-the-art tools.
Related papers
- Transformer-Based Inpainting for Real-Time 3D Streaming in Sparse Multi-Camera Setups [5.442308724054687]
High-quality 3D streaming from multiple cameras is crucial for immersive experiences in many AR/VR applications.<n>Existing approaches typically rely on simple textures for the hole filling, which can result in inconsistencies or visual artifacts.<n>We propose a novel, application-targeted inpainting method independent of the underlying representation as an image-based post-processing step after the novel view rendering.<n>We evaluate our approach against state-of-the-art inpainting techniques under the same real-time constraints and demonstrate that our model achieves the best trade-off between quality and speed.
arXiv Detail & Related papers (2026-03-05T18:59:59Z) - PrevizWhiz: Combining Rough 3D Scenes and 2D Video to Guide Generative Video Previsualization [10.681930120546438]
We present PrevizWhiz, a system that leverages rough 3D scenes in combination with generative image and video models to create stylized video previews.<n>The system integrates frame-level image restyling with adjustable resemblance, time-based editing through motion paths or external video inputs, and refinement into high-fidelity video clips.
arXiv Detail & Related papers (2026-02-03T18:56:40Z) - GenCompositor: Generative Video Compositing with Diffusion Transformer [68.00271033575736]
Traditional pipelines require intensive labor efforts and expert collaboration, resulting in lengthy production cycles and high manpower costs.<n>This new task strives to adaptively inject identity and motion information of foreground video to the target video in an interactive manner.<n>Experiments demonstrate that our method effectively realizes generative video compositing, outperforming existing possible solutions in fidelity and consistency.
arXiv Detail & Related papers (2025-09-02T16:10:13Z) - VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control [47.34885131252508]
Video inpainting aims to restore corrupted video content.
We propose a novel dual-stream paradigm VideoPainter to process masked videos.
We also introduce a novel target region ID resampling technique that enables any-length video inpainting.
arXiv Detail & Related papers (2025-03-07T17:59:46Z) - Video Decomposition Prior: A Methodology to Decompose Videos into Layers [74.36790196133505]
This paper introduces a novel video decomposition prior VDP' framework which derives inspiration from professional video editing practices.
VDP framework decomposes a video sequence into a set of multiple RGB layers and associated opacity levels.
We address tasks such as video object segmentation, dehazing, and relighting.
arXiv Detail & Related papers (2024-12-06T10:35:45Z) - One-Shot Learning Meets Depth Diffusion in Multi-Object Videos [0.0]
This paper introduces a novel depth-conditioning approach that enables the generation of coherent and diverse videos from just a single text-video pair.
Our method fine-tunes the pre-trained model to capture continuous motion by employing custom-designed spatial and temporal attention mechanisms.
During inference, we use the DDIM inversion to provide structural guidance for video generation.
arXiv Detail & Related papers (2024-08-29T16:58:10Z) - MVOC: a training-free multiple video object composition method with diffusion models [10.364986401722625]
We propose a Multiple Video Object Composition (MVOC) method based on diffusion models.
We first perform DDIM inversion on each video object to obtain the corresponding noise features.
Secondly, we combine and edit each object by image editing methods to obtain the first frame of the composited video.
arXiv Detail & Related papers (2024-06-22T12:18:46Z) - VidToMe: Video Token Merging for Zero-Shot Video Editing [100.79999871424931]
We propose a novel approach to enhance temporal consistency in generated videos by merging self-attention tokens across frames.
Our method improves temporal coherence and reduces memory consumption in self-attention computations.
arXiv Detail & Related papers (2023-12-17T09:05:56Z) - 3D-Aware Video Generation [149.5230191060692]
We explore 4D generative adversarial networks (GANs) that learn generation of 3D-aware videos.
By combining neural implicit representations with time-aware discriminator, we develop a GAN framework that synthesizes 3D video supervised only with monocular videos.
arXiv Detail & Related papers (2022-06-29T17:56:03Z) - Condensing a Sequence to One Informative Frame for Video Recognition [113.3056598548736]
This paper studies a two-step alternative that first condenses the video sequence to an informative "frame"
A valid question is how to define "useful information" and then distill from a sequence down to one synthetic frame.
IFS consistently demonstrates evident improvements on image-based 2D networks and clip-based 3D networks.
arXiv Detail & Related papers (2022-01-11T16:13:43Z) - Video Exploration via Video-Specific Autoencoders [60.256055890647595]
We present video-specific autoencoders that enables human-controllable video exploration.
We observe that a simple autoencoder trained on multiple frames of a specific video enables one to perform a large variety of video processing and editing tasks.
arXiv Detail & Related papers (2021-03-31T17:56:13Z) - Temporal Context Aggregation for Video Retrieval with Contrastive
Learning [81.12514007044456]
We propose TCA, a video representation learning framework that incorporates long-range temporal information between frame-level features.
The proposed method shows a significant performance advantage (17% mAP on FIVR-200K) over state-of-the-art methods with video-level features.
arXiv Detail & Related papers (2020-08-04T05:24:20Z) - Task-agnostic Temporally Consistent Facial Video Editing [84.62351915301795]
We propose a task-agnostic, temporally consistent facial video editing framework.
Based on a 3D reconstruction model, our framework is designed to handle several editing tasks in a more unified and disentangled manner.
Compared with the state-of-the-art facial image editing methods, our framework generates video portraits that are more photo-realistic and temporally smooth.
arXiv Detail & Related papers (2020-07-03T02:49:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.