DVI: Depth Guided Video Inpainting for Autonomous Driving
- URL: http://arxiv.org/abs/2007.08854v1
- Date: Fri, 17 Jul 2020 09:29:53 GMT
- Title: DVI: Depth Guided Video Inpainting for Autonomous Driving
- Authors: Miao Liao, Feixiang Lu, Dingfu Zhou, Sibo Zhang, Wei Li, Ruigang Yang
- Abstract summary: We present an automatic video inpainting algorithm that can remove traffic agents from videos.
By building a dense 3D map from stitched point clouds, frames within a video are geometrically correlated.
We are the first to fuse multiple videos for video inpainting.
- Score: 35.94330601020169
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To get clear street-view and photo-realistic simulation in autonomous
driving, we present an automatic video inpainting algorithm that can remove
traffic agents from videos and synthesize missing regions with the guidance of
depth/point cloud. By building a dense 3D map from stitched point clouds,
frames within a video are geometrically correlated via this common 3D map. In
order to fill a target inpainting area in a frame, it is straightforward to
transform pixels from other frames into the current one with correct occlusion.
Furthermore, we are able to fuse multiple videos through 3D point cloud
registration, making it possible to inpaint a target video with multiple source
videos. The motivation is to solve the long-time occlusion problem where an
occluded area has never been visible in the entire video. To our knowledge, we
are the first to fuse multiple videos for video inpainting. To verify the
effectiveness of our approach, we build a large inpainting dataset in the real
urban road environment with synchronized images and Lidar data including many
challenge scenes, e.g., long time occlusion. The experimental results show that
the proposed approach outperforms the state-of-the-art approaches for all the
criteria, especially the RMSE (Root Mean Squared Error) has been reduced by
about 13%.
Related papers
- Tex4D: Zero-shot 4D Scene Texturing with Video Diffusion Models [54.35214051961381]
3D meshes are widely used in computer vision and graphics for their efficiency in animation and minimal memory use in movies, games, AR, and VR.
However, creating temporal consistent and realistic textures for mesh remains labor-intensive for professional artists.
We present 3D Tex sequences that integrates inherent geometry from mesh sequences with video diffusion models to produce consistent textures.
arXiv Detail & Related papers (2024-10-14T17:59:59Z) - Enhancing Temporal Consistency in Video Editing by Reconstructing Videos with 3D Gaussian Splatting [94.84688557937123]
Video-3DGS is a 3D Gaussian Splatting (3DGS)-based video refiner designed to enhance temporal consistency in zero-shot video editors.
Our approach utilizes a two-stage 3D Gaussian optimizing process tailored for editing dynamic monocular videos.
It enhances video editing by ensuring temporal consistency across 58 dynamic monocular videos.
arXiv Detail & Related papers (2024-06-04T17:57:37Z) - Lester: rotoscope animation through video object segmentation and
tracking [0.0]
Lester is a novel method to automatically synthetise retro-style 2D animations from videos.
Video frames are processed with the Segment Anything Model (SAM) and the resulting masks are tracked through subsequent frames with DeAOT.
Results show that the method exhibits an excellent temporal consistency and can correctly process videos with different poses and appearances.
arXiv Detail & Related papers (2024-02-15T11:15:54Z) - Hierarchical Masked 3D Diffusion Model for Video Outpainting [20.738731220322176]
We introduce a masked 3D diffusion model for video outpainting.
This allows us to use multiple guide frames to connect the results of multiple video clip inferences.
We also introduce a hybrid coarse-to-fine inference pipeline to alleviate the artifact accumulation problem.
arXiv Detail & Related papers (2023-09-05T10:52:21Z) - Make-It-4D: Synthesizing a Consistent Long-Term Dynamic Scene Video from
a Single Image [59.18564636990079]
We study the problem of synthesizing a long-term dynamic video from only a single image.
Existing methods either hallucinate inconsistent perpetual views or struggle with long camera trajectories.
We present Make-It-4D, a novel method that can generate a consistent long-term dynamic video from a single image.
arXiv Detail & Related papers (2023-08-20T12:53:50Z) - Video2StyleGAN: Encoding Video in Latent Space for Manipulation [63.03250800510085]
We propose a novel network to encode face videos into the latent space of StyleGAN for semantic face video manipulation.
Our approach can significantly outperform existing single image methods, while achieving real-time (66 fps) speed.
arXiv Detail & Related papers (2022-06-27T06:48:15Z) - Real-time dense 3D Reconstruction from monocular video data captured by
low-cost UAVs [0.3867363075280543]
Real-time 3D reconstruction enables fast dense mapping of the environment which benefits numerous applications, such as navigation or live evaluation of an emergency.
In contrast to most real-time capable approaches, our approach does not need an explicit depth sensor.
By exploiting the self-motion of the unmanned aerial vehicle (UAV) flying with oblique view around buildings, we estimate both camera trajectory and depth for selected images with enough novel content.
arXiv Detail & Related papers (2021-04-21T13:12:17Z) - Learning Joint Spatial-Temporal Transformations for Video Inpainting [58.939131620135235]
We propose to learn a joint Spatial-Temporal Transformer Network (STTN) for video inpainting.
We simultaneously fill missing regions in all input frames by self-attention, and propose to optimize STTN by a spatial-temporal adversarial loss.
arXiv Detail & Related papers (2020-07-20T16:35:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.