Flow-Guided Video Inpainting with Scene Templates
- URL: http://arxiv.org/abs/2108.12845v1
- Date: Sun, 29 Aug 2021 13:49:13 GMT
- Title: Flow-Guided Video Inpainting with Scene Templates
- Authors: Dong Lao, Peihao Zhu, Peter Wonka, Ganesh Sundaramoorthi
- Abstract summary: We consider the problem of filling in missing-temporal regions of a video.
We introduce a generative model of images in relation to the scene (without missing regions) and mappings from the scene to images.
We use the model to jointly infer the scene template, a 2D representation of the scene, and the mappings.
- Score: 57.12499174362993
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider the problem of filling in missing spatio-temporal regions of a
video. We provide a novel flow-based solution by introducing a generative model
of images in relation to the scene (without missing regions) and mappings from
the scene to images. We use the model to jointly infer the scene template, a 2D
representation of the scene, and the mappings. This ensures consistency of the
frame-to-frame flows generated to the underlying scene, reducing geometric
distortions in flow based inpainting. The template is mapped to the missing
regions in the video by a new L2-L1 interpolation scheme, creating crisp
inpaintings and reducing common blur and distortion artifacts. We show on two
benchmark datasets that our approach out-performs state-of-the-art
quantitatively and in user studies.
Related papers
- GenWarp: Single Image to Novel Views with Semantic-Preserving Generative Warping [47.38125925469167]
We propose a semantic-preserving generative warping framework to generate novel views from a single image.
Our approach addresses the limitations of existing methods by conditioning the generative model on source view images.
Our model outperforms existing methods in both in-domain and out-of-domain scenarios.
arXiv Detail & Related papers (2024-05-27T15:07:04Z) - Semantically Consistent Video Inpainting with Conditional Diffusion Models [16.42354856518832]
We present a framework for solving problems with conditional video diffusion models.
We introduce inpainting-specific sampling schemes which capture crucial long-range dependencies in the context.
We devise a novel method for conditioning on the known pixels in incomplete frames.
arXiv Detail & Related papers (2024-04-30T23:49:26Z) - RecDiffusion: Rectangling for Image Stitching with Diffusion Models [53.824503710254206]
We introduce a novel diffusion-based learning framework, textbfRecDiffusion, for image stitching rectangling.
This framework combines Motion Diffusion Models (MDM) to generate motion fields, effectively transitioning from the stitched image's irregular borders to a geometrically corrected intermediary.
arXiv Detail & Related papers (2024-03-28T06:22:45Z) - Blocks2World: Controlling Realistic Scenes with Editable Primitives [5.541644538483947]
We present Blocks2World, a novel method for 3D scene rendering and editing.
Our technique begins by extracting 3D parallelepipeds from various objects in a given scene using convex decomposition.
The next stage involves training a conditioned model that learns to generate images from the 2D-rendered convex primitives.
arXiv Detail & Related papers (2023-07-07T21:38:50Z) - Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation [93.18163456287164]
This paper proposes a novel text-guided video-to-video translation framework to adapt image models to videos.
Our framework achieves global style and local texture temporal consistency at a low cost.
arXiv Detail & Related papers (2023-06-13T17:52:23Z) - SceneGenie: Scene Graph Guided Diffusion Models for Image Synthesis [38.22195812238951]
We propose a novel guidance approach for the sampling process in the diffusion model.
Our approach guides the model with semantic features from CLIP embeddings and enforces geometric constraints.
Our results demonstrate the effectiveness of incorporating bounding box and segmentation map guidance in the diffusion model sampling process.
arXiv Detail & Related papers (2023-04-28T00:14:28Z) - Diffusion-Based Scene Graph to Image Generation with Masked Contrastive
Pre-Training [112.94542676251133]
We propose to learn scene graph embeddings by directly optimizing their alignment with images.
Specifically, we pre-train an encoder to extract both global and local information from scene graphs.
The resulting method, called SGDiff, allows for the semantic manipulation of generated images by modifying scene graph nodes and connections.
arXiv Detail & Related papers (2022-11-21T01:11:19Z) - Free View Synthesis [100.86844680362196]
We present a method for novel view synthesis from input images that are freely distributed around a scene.
Our method does not rely on a regular arrangement of input views, can synthesize images for free camera movement through the scene, and works for general scenes with unconstrained geometric layouts.
arXiv Detail & Related papers (2020-08-12T18:16:08Z) - Learning Joint Spatial-Temporal Transformations for Video Inpainting [58.939131620135235]
We propose to learn a joint Spatial-Temporal Transformer Network (STTN) for video inpainting.
We simultaneously fill missing regions in all input frames by self-attention, and propose to optimize STTN by a spatial-temporal adversarial loss.
arXiv Detail & Related papers (2020-07-20T16:35:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.