StereoPose: Category-Level 6D Transparent Object Pose Estimation from
Stereo Images via Back-View NOCS
- URL: http://arxiv.org/abs/2211.01644v1
- Date: Thu, 3 Nov 2022 08:36:09 GMT
- Title: StereoPose: Category-Level 6D Transparent Object Pose Estimation from
Stereo Images via Back-View NOCS
- Authors: Kai Chen, Stephen James, Congying Sui, Yun-Hui Liu, Pieter Abbeel, Qi
Dou
- Abstract summary: We present StereoPose, a novel stereo image framework for category-level object pose estimation.
For a robust estimation from pure stereo images, we develop a pipeline that decouples category-level pose estimation into object size estimation, initial pose estimation, and pose refinement.
To address the issue of image content aliasing, we define a back-view NOCS map for the transparent object.
The back-view NOCS aims to reduce the network learning ambiguity caused by content aliasing, and leverage informative cues on the back of the transparent object for more accurate pose estimation.
- Score: 106.62225866064313
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most existing methods for category-level pose estimation rely on object point
clouds. However, when considering transparent objects, depth cameras are
usually not able to capture meaningful data, resulting in point clouds with
severe artifacts. Without a high-quality point cloud, existing methods are not
applicable to challenging transparent objects. To tackle this problem, we
present StereoPose, a novel stereo image framework for category-level object
pose estimation, ideally suited for transparent objects. For a robust
estimation from pure stereo images, we develop a pipeline that decouples
category-level pose estimation into object size estimation, initial pose
estimation, and pose refinement. StereoPose then estimates object pose based on
representation in the normalized object coordinate space~(NOCS). To address the
issue of image content aliasing, we further define a back-view NOCS map for the
transparent object. The back-view NOCS aims to reduce the network learning
ambiguity caused by content aliasing, and leverage informative cues on the back
of the transparent object for more accurate pose estimation. To further improve
the performance of the stereo framework, StereoPose is equipped with a parallax
attention module for stereo feature fusion and an epipolar loss for improving
the stereo-view consistency of network predictions. Extensive experiments on
the public TOD dataset demonstrate the superiority of the proposed StereoPose
framework for category-level 6D transparent object pose estimation.
Related papers
- Extending 6D Object Pose Estimators for Stereo Vision [4.818865062632567]
We create a BOP compatible stereo version of the YCB-V dataset for 6D object pose estimation.
Our method outperforms state-of-the-art 6D pose estimation algorithms by utilizing stereo vision and can easily be adopted for other dense feature-based algorithms.
arXiv Detail & Related papers (2024-02-08T12:08:52Z) - LocaliseBot: Multi-view 3D object localisation with differentiable
rendering for robot grasping [9.690844449175948]
We focus on object pose estimation.
Our approach relies on three pieces of information: multiple views of the object, the camera's parameters at those viewpoints, and 3D CAD models of objects.
We show that the estimated object pose results in 99.65% grasp accuracy with the ground truth grasp candidates.
arXiv Detail & Related papers (2023-11-14T14:27:53Z) - MV-ROPE: Multi-view Constraints for Robust Category-level Object Pose and Size Estimation [23.615122326731115]
We propose a novel solution that makes use of RGB video streams.
Our framework consists of three modules: a scale-aware monocular dense SLAM solution, a lightweight object pose predictor, and an object-level pose graph.
Our experimental results demonstrate that when utilizing public dataset sequences with high-quality depth information, the proposed method exhibits comparable performance to state-of-the-art RGB-D methods.
arXiv Detail & Related papers (2023-08-17T08:29:54Z) - RelPose++: Recovering 6D Poses from Sparse-view Observations [66.6922660401558]
We address the task of estimating 6D camera poses from sparse-view image sets (2-8 images)
We build on the recent RelPose framework which learns a network that infers distributions over relative rotations over image pairs.
Our final system results in large improvements in 6D pose prediction over prior art on both seen and unseen object categories.
arXiv Detail & Related papers (2023-05-08T17:59:58Z) - OA-BEV: Bringing Object Awareness to Bird's-Eye-View Representation for
Multi-Camera 3D Object Detection [78.38062015443195]
OA-BEV is a network that can be plugged into the BEV-based 3D object detection framework.
Our method achieves consistent improvements over the BEV-based baselines in terms of both average precision and nuScenes detection score.
arXiv Detail & Related papers (2023-01-13T06:02:31Z) - PhoCaL: A Multi-Modal Dataset for Category-Level Object Pose Estimation
with Photometrically Challenging Objects [45.31344700263873]
We introduce a multimodal dataset for category-level object pose estimation with photometrically challenging objects termed PhoCaL.
PhoCaL comprises 60 high quality 3D models of household objects over 8 categories including highly reflective, transparent and symmetric objects.
It ensures sub-millimeter accuracy of the pose for opaque textured, shiny and transparent objects, no motion blur and perfect camera synchronisation.
arXiv Detail & Related papers (2022-05-18T09:21:09Z) - Next-Best-View Prediction for Active Stereo Cameras and Highly
Reflective Objects [12.21992378133376]
We propose a next-best-view framework to strategically select camera viewpoints for completing depth data on reflective objects.
We employ an RGB-based pose estimator to obtain current pose predictions from the existing data.
Our active depth acquisition method outperforms two strong baselines for both depth completion and object pose estimation performance.
arXiv Detail & Related papers (2022-02-27T01:48:02Z) - Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo [71.59494156155309]
Existing approaches for multi-view 3D pose estimation explicitly establish cross-view correspondences to group 2D pose detections from multiple camera views.
We present our multi-view 3D pose estimation approach based on plane sweep stereo to jointly address the cross-view fusion and 3D pose reconstruction in a single shot.
arXiv Detail & Related papers (2021-04-06T03:49:35Z) - 3D Object Classification on Partial Point Clouds: A Practical
Perspective [91.81377258830703]
A point cloud is a popular shape representation adopted in 3D object classification.
This paper introduces a practical setting to classify partial point clouds of object instances under any poses.
A novel algorithm in an alignment-classification manner is proposed in this paper.
arXiv Detail & Related papers (2020-12-18T04:00:56Z) - Single View Metrology in the Wild [94.7005246862618]
We present a novel approach to single view metrology that can recover the absolute scale of a scene represented by 3D heights of objects or camera height above the ground.
Our method relies on data-driven priors learned by a deep network specifically designed to imbibe weakly supervised constraints from the interplay of the unknown camera with 3D entities such as object heights.
We demonstrate state-of-the-art qualitative and quantitative results on several datasets as well as applications including virtual object insertion.
arXiv Detail & Related papers (2020-07-18T22:31:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.