GeoFill: Reference-Based Image Inpainting of Scenes with Complex
Geometry
- URL: http://arxiv.org/abs/2201.08131v1
- Date: Thu, 20 Jan 2022 12:17:13 GMT
- Title: GeoFill: Reference-Based Image Inpainting of Scenes with Complex
Geometry
- Authors: Yunhan Zhao, Connelly Barnes, Yuqian Zhou, Eli Shechtman, Sohrab
Amirghodsi, Charless Fowlkes
- Abstract summary: Reference-guided image inpainting restores image pixels by leveraging the content from another reference image.
We leverage a monocular depth estimate and predict relative pose between cameras, then align the reference image to the target by a differentiable 3D reprojection.
Our approach achieves state-of-the-art performance on both RealEstate10K and MannequinChallenge dataset with large baselines, complex geometry and extreme camera motions.
- Score: 40.68659515139644
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reference-guided image inpainting restores image pixels by leveraging the
content from another reference image. The previous state-of-the-art, TransFill,
warps the source image with multiple homographies, and fuses them together for
hole filling. Inspired by structure from motion pipelines and recent progress
in monocular depth estimation, we propose a more principled approach that does
not require heuristic planar assumptions. We leverage a monocular depth
estimate and predict relative pose between cameras, then align the reference
image to the target by a differentiable 3D reprojection and a joint
optimization of relative pose and depth map scale and offset. Our approach
achieves state-of-the-art performance on both RealEstate10K and
MannequinChallenge dataset with large baselines, complex geometry and extreme
camera motions. We experimentally verify our approach is also better at
handling large holes.
Related papers
- Refinement of Monocular Depth Maps via Multi-View Differentiable Rendering [4.717325308876748]
We present a novel approach to generate view consistent and detailed depth maps from a number of posed images.
We leverage advances in monocular depth estimation, which generate topologically complete, but metrically inaccurate depth maps.
Our method is able to generate dense, detailed, high-quality depth maps, also in challenging indoor scenarios, and outperforms state-of-the-art depth reconstruction approaches.
arXiv Detail & Related papers (2024-10-04T18:50:28Z) - DoubleTake: Geometry Guided Depth Estimation [17.464549832122714]
Estimating depth from a sequence of posed RGB images is a fundamental computer vision task.
We introduce a reconstruction which combines volume features with a hint of the prior geometry, rendered as a depth map from the current camera location.
We demonstrate that our method can run at interactive speeds, state-of-the-art estimates of depth and 3D scene in both offline and incremental evaluation scenarios.
arXiv Detail & Related papers (2024-06-26T14:29:05Z) - SuperPrimitive: Scene Reconstruction at a Primitive Level [23.934492494774116]
Joint camera pose and dense geometry estimation from a set of images or a monocular video remains a challenging problem.
Most dense incremental reconstruction systems operate directly on image pixels and solve for their 3D positions using multi-view geometry cues.
We address this issue with a new image representation which we call a SuperPrimitive.
arXiv Detail & Related papers (2023-12-10T13:44:03Z) - FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models [67.96827539201071]
We propose a novel test-time optimization approach for 3D scene reconstruction.
Our method achieves state-of-the-art cross-dataset reconstruction on five zero-shot testing datasets.
arXiv Detail & Related papers (2023-08-10T17:55:02Z) - Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo [71.59494156155309]
Existing approaches for multi-view 3D pose estimation explicitly establish cross-view correspondences to group 2D pose detections from multiple camera views.
We present our multi-view 3D pose estimation approach based on plane sweep stereo to jointly address the cross-view fusion and 3D pose reconstruction in a single shot.
arXiv Detail & Related papers (2021-04-06T03:49:35Z) - Deep Two-View Structure-from-Motion Revisited [83.93809929963969]
Two-view structure-from-motion (SfM) is the cornerstone of 3D reconstruction and visual SLAM.
We propose to revisit the problem of deep two-view SfM by leveraging the well-posedness of the classic pipeline.
Our method consists of 1) an optical flow estimation network that predicts dense correspondences between two frames; 2) a normalized pose estimation module that computes relative camera poses from the 2D optical flow correspondences, and 3) a scale-invariant depth estimation network that leverages epipolar geometry to reduce the search space, refine the dense correspondences, and estimate relative depth maps.
arXiv Detail & Related papers (2021-04-01T15:31:20Z) - Deep 3D Capture: Geometry and Reflectance from Sparse Multi-View Images [59.906948203578544]
We introduce a novel learning-based method to reconstruct the high-quality geometry and complex, spatially-varying BRDF of an arbitrary object.
We first estimate per-view depth maps using a deep multi-view stereo network.
These depth maps are used to coarsely align the different views.
We propose a novel multi-view reflectance estimation network architecture.
arXiv Detail & Related papers (2020-03-27T21:28:54Z) - Video Depth Estimation by Fusing Flow-to-Depth Proposals [65.24533384679657]
We present an approach with a differentiable flow-to-depth layer for video depth estimation.
The model consists of a flow-to-depth layer, a camera pose refinement module, and a depth fusion network.
Our approach outperforms state-of-the-art depth estimation methods, and has reasonable cross dataset generalization capability.
arXiv Detail & Related papers (2019-12-30T10:45:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.