Next-Best-View Prediction for Active Stereo Cameras and Highly
Reflective Objects
- URL: http://arxiv.org/abs/2202.13263v1
- Date: Sun, 27 Feb 2022 01:48:02 GMT
- Title: Next-Best-View Prediction for Active Stereo Cameras and Highly
Reflective Objects
- Authors: Jun Yang and Steven L. Waslander
- Abstract summary: We propose a next-best-view framework to strategically select camera viewpoints for completing depth data on reflective objects.
We employ an RGB-based pose estimator to obtain current pose predictions from the existing data.
Our active depth acquisition method outperforms two strong baselines for both depth completion and object pose estimation performance.
- Score: 12.21992378133376
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Depth acquisition with the active stereo camera is a challenging task for
highly reflective objects. When setup permits, multi-view fusion can provide
increased levels of depth completion. However, due to the slow acquisition
speed of high-end active stereo cameras, collecting a large number of
viewpoints for a single scene is generally not practical. In this work, we
propose a next-best-view framework to strategically select camera viewpoints
for completing depth data on reflective objects. In particular, we explicitly
model the specular reflection of reflective surfaces based on the Phong
reflection model and a photometric response function. Given the object CAD
model and grayscale image, we employ an RGB-based pose estimator to obtain
current pose predictions from the existing data, which is used to form
predicted surface normal and depth hypotheses, and allows us to then assess the
information gain from a subsequent frame for any candidate viewpoint. Using
this formulation, we implement an active perception pipeline which is evaluated
on a challenging real-world dataset. The evaluation results demonstrate that
our active depth acquisition method outperforms two strong baselines for both
depth completion and object pose estimation performance.
Related papers
- Self-supervised Monocular Depth Estimation on Water Scenes via Specular Reflection Prior [3.2120448116996103]
This paper proposes the first self-supervision for deep-learning depth estimation on water scenes via intra-frame priors.
In the first stage, a water segmentation network is performed to separate the reflection components from the entire image.
The photometric re-projection error, incorporating SmoothL1 and a novel photometric adaptive SSIM, is formulated to optimize pose and depth estimation.
arXiv Detail & Related papers (2024-04-10T17:25:42Z) - Instance-aware Multi-Camera 3D Object Detection with Structural Priors
Mining and Self-Boosting Learning [93.71280187657831]
Camera-based bird-eye-view (BEV) perception paradigm has made significant progress in the autonomous driving field.
We propose IA-BEV, which integrates image-plane instance awareness into the depth estimation process within a BEV-based detector.
arXiv Detail & Related papers (2023-12-13T09:24:42Z) - MV-ROPE: Multi-view Constraints for Robust Category-level Object Pose and Size Estimation [23.615122326731115]
We propose a novel solution that makes use of RGB video streams.
Our framework consists of three modules: a scale-aware monocular dense SLAM solution, a lightweight object pose predictor, and an object-level pose graph.
Our experimental results demonstrate that when utilizing public dataset sequences with high-quality depth information, the proposed method exhibits comparable performance to state-of-the-art RGB-D methods.
arXiv Detail & Related papers (2023-08-17T08:29:54Z) - StereoPose: Category-Level 6D Transparent Object Pose Estimation from
Stereo Images via Back-View NOCS [106.62225866064313]
We present StereoPose, a novel stereo image framework for category-level object pose estimation.
For a robust estimation from pure stereo images, we develop a pipeline that decouples category-level pose estimation into object size estimation, initial pose estimation, and pose refinement.
To address the issue of image content aliasing, we define a back-view NOCS map for the transparent object.
The back-view NOCS aims to reduce the network learning ambiguity caused by content aliasing, and leverage informative cues on the back of the transparent object for more accurate pose estimation.
arXiv Detail & Related papers (2022-11-03T08:36:09Z) - Uncertainty Guided Depth Fusion for Spike Camera [49.41822923588663]
We propose a novel Uncertainty-Guided Depth Fusion (UGDF) framework to fuse predictions of monocular and stereo depth estimation networks for spike camera.
Our framework is motivated by the fact that stereo spike depth estimation achieves better results at close range.
In order to demonstrate the advantage of spike depth estimation over traditional camera depth estimation, we contribute a spike-depth dataset named CitySpike20K.
arXiv Detail & Related papers (2022-08-26T13:04:01Z) - PhoCaL: A Multi-Modal Dataset for Category-Level Object Pose Estimation
with Photometrically Challenging Objects [45.31344700263873]
We introduce a multimodal dataset for category-level object pose estimation with photometrically challenging objects termed PhoCaL.
PhoCaL comprises 60 high quality 3D models of household objects over 8 categories including highly reflective, transparent and symmetric objects.
It ensures sub-millimeter accuracy of the pose for opaque textured, shiny and transparent objects, no motion blur and perfect camera synchronisation.
arXiv Detail & Related papers (2022-05-18T09:21:09Z) - Object Level Depth Reconstruction for Category Level 6D Object Pose
Estimation From Monocular RGB Image [12.382992538846896]
We propose a novel approach named Object Level Depth reconstruction Network (OLD-Net) taking only RGB images as input for category-level 6D object pose estimation.
We propose to directly predict object-level depth from a monocular RGB image by deforming the category-level shape prior into object-level depth and the canonical NOCS representation.
Experiments on the challenging CAMERA25 and REAL275 datasets indicate that our model achieves state-of-the-art performance.
arXiv Detail & Related papers (2022-04-04T15:33:28Z) - DONet: Learning Category-Level 6D Object Pose and Size Estimation from
Depth Observation [53.55300278592281]
We propose a method of Category-level 6D Object Pose and Size Estimation (COPSE) from a single depth image.
Our framework makes inferences based on the rich geometric information of the object in the depth channel alone.
Our framework competes with state-of-the-art approaches that require labeled real-world images.
arXiv Detail & Related papers (2021-06-27T10:41:50Z) - Robust Consistent Video Depth Estimation [65.53308117778361]
We present an algorithm for estimating consistent dense depth maps and camera poses from a monocular video.
Our algorithm combines two complementary techniques: (1) flexible deformation-splines for low-frequency large-scale alignment and (2) geometry-aware depth filtering for high-frequency alignment of fine depth details.
In contrast to prior approaches, our method does not require camera poses as input and achieves robust reconstruction for challenging hand-held cell phone captures containing a significant amount of noise, shake, motion blur, and rolling shutter deformations.
arXiv Detail & Related papers (2020-12-10T18:59:48Z) - Deep 3D Capture: Geometry and Reflectance from Sparse Multi-View Images [59.906948203578544]
We introduce a novel learning-based method to reconstruct the high-quality geometry and complex, spatially-varying BRDF of an arbitrary object.
We first estimate per-view depth maps using a deep multi-view stereo network.
These depth maps are used to coarsely align the different views.
We propose a novel multi-view reflectance estimation network architecture.
arXiv Detail & Related papers (2020-03-27T21:28:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.