Estimating Depth of Monocular Panoramic Image with Teacher-Student Model Fusing Equirectangular and Spherical Representations
- URL: http://arxiv.org/abs/2405.16858v1
- Date: Mon, 27 May 2024 06:11:16 GMT
- Title: Estimating Depth of Monocular Panoramic Image with Teacher-Student Model Fusing Equirectangular and Spherical Representations
- Authors: Jingguo Liu, Yijun Xu, Shigang Li, Jianfeng Li,
- Abstract summary: We propose a method of estimating the depth of monocular panoramic image with a teacher-student model fusing equirectangular and spherical representations.
In experiments, the proposed method is tested on several well-known 360 monocular depth estimation benchmark datasets.
- Score: 3.8240176158734194
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Disconnectivity and distortion are the two problems which must be coped with when processing 360 degrees equirectangular images. In this paper, we propose a method of estimating the depth of monocular panoramic image with a teacher-student model fusing equirectangular and spherical representations. In contrast with the existing methods fusing an equirectangular representation with a cube map representation or tangent representation, a spherical representation is a better choice because a sampling on a sphere is more uniform and can also cope with distortion more effectively. In this processing, a novel spherical convolution kernel computing with sampling points on a sphere is developed to extract features from the spherical representation, and then, a Segmentation Feature Fusion(SFF) methodology is utilized to combine the features with ones extracted from the equirectangular representation. In contrast with the existing methods using a teacher-student model to obtain a lighter model of depth estimation, we use a teacher-student model to learn the latent features of depth images. This results in a trained model which estimates the depth map of an equirectangular image using not only the feature maps extracted from an input equirectangular image but also the distilled knowledge learnt from the ground truth of depth map of a training set. In experiments, the proposed method is tested on several well-known 360 monocular depth estimation benchmark datasets, and outperforms the existing methods for the most evaluation indexes.
Related papers
- Diff-DOPE: Differentiable Deep Object Pose Estimation [29.703385848843414]
We introduce Diff-DOPE, a 6-DoF pose refiner that takes as input an image, a 3D textured model of an object, and an initial pose of the object.
The method uses differentiable rendering to update the object pose to minimize the visual error between the image and the projection of the model.
We show that this simple, yet effective, idea is able to achieve state-of-the-art results on pose estimation datasets.
arXiv Detail & Related papers (2023-09-30T18:52:57Z) - Parametric Depth Based Feature Representation Learning for Object
Detection and Segmentation in Bird's Eye View [44.78243406441798]
This paper focuses on leveraging geometry information, such as depth, to model such feature transformation.
We first lift the 2D image features to the 3D space defined for the ego vehicle via a predicted parametric depth distribution for each pixel in each view.
We then aggregate the 3D feature volume based on the 3D space occupancy derived from depth to the BEV frame.
arXiv Detail & Related papers (2023-07-09T06:07:22Z) - Multi-View Reconstruction using Signed Ray Distance Functions (SRDF) [22.75986869918975]
We investigate a new computational approach that builds on a novel shape representation that is volumetric.
The shape energy associated to this representation evaluates 3D geometry given color images and does not need appearance prediction.
In practice we propose an implicit shape representation, the SRDF, based on signed distances which we parameterize by depths along camera rays.
arXiv Detail & Related papers (2022-08-31T19:32:17Z) - Semantic keypoint-based pose estimation from single RGB frames [64.80395521735463]
We present an approach to estimating the continuous 6-DoF pose of an object from a single RGB image.
The approach combines semantic keypoints predicted by a convolutional network (convnet) with a deformable shape model.
We show that our approach can accurately recover the 6-DoF object pose for both instance- and class-based scenarios.
arXiv Detail & Related papers (2022-04-12T15:03:51Z) - Dense Depth Estimation from Multiple 360-degree Images Using Virtual
Depth [4.984601297028257]
The proposed pipeline leverages a spherical camera model that compensates for radial distortion in 360degree: images.
We propose an effective dense depth estimation method by setting virtual depth and minimizing photonic reprojection error.
The experimental results verify that the proposed pipeline improves estimation accuracy compared to the current state-of-art dense depth estimation methods.
arXiv Detail & Related papers (2021-12-30T05:27:28Z) - Neural Radiance Fields Approach to Deep Multi-View Photometric Stereo [103.08512487830669]
We present a modern solution to the multi-view photometric stereo problem (MVPS)
We procure the surface orientation using a photometric stereo (PS) image formation model and blend it with a multi-view neural radiance field representation to recover the object's surface geometry.
Our method performs neural rendering of multi-view images while utilizing surface normals estimated by a deep photometric stereo network.
arXiv Detail & Related papers (2021-10-11T20:20:03Z) - Aug3D-RPN: Improving Monocular 3D Object Detection by Synthetic Images
with Virtual Depth [64.29043589521308]
We propose a rendering module to augment the training data by synthesizing images with virtual-depths.
The rendering module takes as input the RGB image and its corresponding sparse depth image, outputs a variety of photo-realistic synthetic images.
Besides, we introduce an auxiliary module to improve the detection model by jointly optimizing it through a depth estimation task.
arXiv Detail & Related papers (2021-07-28T11:00:47Z) - DONet: Learning Category-Level 6D Object Pose and Size Estimation from
Depth Observation [53.55300278592281]
We propose a method of Category-level 6D Object Pose and Size Estimation (COPSE) from a single depth image.
Our framework makes inferences based on the rich geometric information of the object in the depth channel alone.
Our framework competes with state-of-the-art approaches that require labeled real-world images.
arXiv Detail & Related papers (2021-06-27T10:41:50Z) - ShaRF: Shape-conditioned Radiance Fields from a Single View [54.39347002226309]
We present a method for estimating neural scenes representations of objects given only a single image.
The core of our method is the estimation of a geometric scaffold for the object.
We demonstrate in several experiments the effectiveness of our approach in both synthetic and real images.
arXiv Detail & Related papers (2021-02-17T16:40:28Z) - Depth Completion using Piecewise Planar Model [94.0808155168311]
A depth map can be represented by a set of learned bases and can be efficiently solved in a closed form solution.
However, one issue with this method is that it may create artifacts when colour boundaries are inconsistent with depth boundaries.
We enforce a more strict model in depth recovery: a piece-wise planar model.
arXiv Detail & Related papers (2020-12-06T07:11:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.