Learning Geocentric Object Pose in Oblique Monocular Images
- URL: http://arxiv.org/abs/2007.00729v1
- Date: Wed, 1 Jul 2020 20:06:19 GMT
- Title: Learning Geocentric Object Pose in Oblique Monocular Images
- Authors: Gordon Christie, Rodrigo Rene Rai Munoz Abujder, Kevin Foster, Shea
Hagstrom, Gregory D. Hager, Myron Z. Brown
- Abstract summary: An object's geocentric pose, defined as the height above ground and orientation with respect to gravity, is a powerful representation of real-world structure for object detection, segmentation, and localization tasks using RGBD images.
We develop an encoding of geocentric pose to address this challenge and train a deep network to compute the representation densely, supervised by publicly available airborne lidar.
We exploit these attributes to rectify oblique images and remove observed object parallax to dramatically improve the accuracy of localization and to enable accurate alignment of multiple images taken from very different oblique viewpoints.
- Score: 18.15647135620892
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: An object's geocentric pose, defined as the height above ground and
orientation with respect to gravity, is a powerful representation of real-world
structure for object detection, segmentation, and localization tasks using RGBD
images. For close-range vision tasks, height and orientation have been derived
directly from stereo-computed depth and more recently from monocular depth
predicted by deep networks. For long-range vision tasks such as Earth
observation, depth cannot be reliably estimated with monocular images. Inspired
by recent work in monocular height above ground prediction and optical flow
prediction from static images, we develop an encoding of geocentric pose to
address this challenge and train a deep network to compute the representation
densely, supervised by publicly available airborne lidar. We exploit these
attributes to rectify oblique images and remove observed object parallax to
dramatically improve the accuracy of localization and to enable accurate
alignment of multiple images taken from very different oblique viewpoints. We
demonstrate the value of our approach by extending two large-scale public
datasets for semantic segmentation in oblique satellite images. All of our data
and code are publicly available.
Related papers
- Depth-guided Texture Diffusion for Image Semantic Segmentation [47.46257473475867]
We introduce a Depth-guided Texture Diffusion approach that effectively tackles the outlined challenge.
Our method extracts low-level features from edges and textures to create a texture image.
By integrating this enriched depth map with the original RGB image into a joint feature embedding, our method effectively bridges the disparity between the depth map and the image.
arXiv Detail & Related papers (2024-08-17T04:55:03Z) - GEOcc: Geometrically Enhanced 3D Occupancy Network with Implicit-Explicit Depth Fusion and Contextual Self-Supervision [49.839374549646884]
This paper presents GEOcc, a Geometric-Enhanced Occupancy network tailored for vision-only surround-view perception.
Our approach achieves State-Of-The-Art performance on the Occ3D-nuScenes dataset with the least image resolution needed and the most weightless image backbone.
arXiv Detail & Related papers (2024-05-17T07:31:20Z) - ${S}^{2}$Net: Accurate Panorama Depth Estimation on Spherical Surface [4.649656275858966]
We propose an end-to-end deep network for monocular panorama depth estimation on a unit spherical surface.
Specifically, we project the feature maps extracted from equirectangular images onto unit spherical surface sampled by uniformly distributed grids.
We propose a global cross-attention-based fusion module to fuse the feature maps from skip connection and enhance the ability to obtain global context.
arXiv Detail & Related papers (2023-01-14T07:39:15Z) - Visual Attention-based Self-supervised Absolute Depth Estimation using
Geometric Priors in Autonomous Driving [8.045833295463094]
We introduce a fully Visual Attention-based Depth (VADepth) network, where spatial attention and channel attention are applied to all stages.
By continuously extracting the dependencies of features along the spatial and channel dimensions over a long distance, VADepth network can effectively preserve important details.
Experimental results on the KITTI dataset show that this architecture achieves the state-of-the-art performance.
arXiv Detail & Related papers (2022-05-18T08:01:38Z) - Accurate 3-DoF Camera Geo-Localization via Ground-to-Satellite Image
Matching [102.39635336450262]
We address the problem of ground-to-satellite image geo-localization by matching a query image captured at the ground level against a large-scale database with geotagged satellite images.
Our new method is able to achieve the fine-grained location of a query image, up to pixel size precision of the satellite image.
arXiv Detail & Related papers (2022-03-26T20:10:38Z) - GeoFill: Reference-Based Image Inpainting of Scenes with Complex
Geometry [40.68659515139644]
Reference-guided image inpainting restores image pixels by leveraging the content from another reference image.
We leverage a monocular depth estimate and predict relative pose between cameras, then align the reference image to the target by a differentiable 3D reprojection.
Our approach achieves state-of-the-art performance on both RealEstate10K and MannequinChallenge dataset with large baselines, complex geometry and extreme camera motions.
arXiv Detail & Related papers (2022-01-20T12:17:13Z) - Single View Geocentric Pose in the Wild [18.08385304935249]
We present a model for learning to regress geocentric pose using airborne lidar images.
We also address practical issues required to deploy this method in the wild for real-world applications.
arXiv Detail & Related papers (2021-05-18T01:55:15Z) - S2R-DepthNet: Learning a Generalizable Depth-specific Structural
Representation [63.58891781246175]
Human can infer the 3D geometry of a scene from a sketch instead of a realistic image, which indicates that the spatial structure plays a fundamental role in understanding the depth of scenes.
We are the first to explore the learning of a depth-specific structural representation, which captures the essential feature for depth estimation and ignores irrelevant style information.
Our S2R-DepthNet can be well generalized to unseen real-world data directly even though it is only trained on synthetic data.
arXiv Detail & Related papers (2021-04-02T03:55:41Z) - Deep 3D Capture: Geometry and Reflectance from Sparse Multi-View Images [59.906948203578544]
We introduce a novel learning-based method to reconstruct the high-quality geometry and complex, spatially-varying BRDF of an arbitrary object.
We first estimate per-view depth maps using a deep multi-view stereo network.
These depth maps are used to coarsely align the different views.
We propose a novel multi-view reflectance estimation network architecture.
arXiv Detail & Related papers (2020-03-27T21:28:54Z) - Learning Depth With Very Sparse Supervision [57.911425589947314]
This paper explores the idea that perception gets coupled to 3D properties of the world via interaction with the environment.
We train a specialized global-local network architecture with what would be available to a robot interacting with the environment.
Experiments on several datasets show that, when ground truth is available even for just one of the image pixels, the proposed network can learn monocular dense depth estimation up to 22.5% more accurately than state-of-the-art approaches.
arXiv Detail & Related papers (2020-03-02T10:44:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.