NeRF-Pose: A First-Reconstruct-Then-Regress Approach for
Weakly-supervised 6D Object Pose Estimation
- URL: http://arxiv.org/abs/2203.04802v2
- Date: Sat, 9 Sep 2023 04:49:33 GMT
- Title: NeRF-Pose: A First-Reconstruct-Then-Regress Approach for
Weakly-supervised 6D Object Pose Estimation
- Authors: Fu Li, Hao Yu, Ivan Shugurov, Benjamin Busam, Shaowu Yang, Slobodan
Ilic
- Abstract summary: We present a weakly-supervised reconstruction-based pipeline, named NeRF-Pose, which needs only 2D object segmentation and known relative camera poses during training.
A NeRF-enabled RAN+SAC algorithm is used to estimate stable and accurate pose from the predicted correspondences.
Experiments on LineMod-Occlusion show that the proposed method has state-of-the-art accuracy in comparison to the best 6D pose estimation methods.
- Score: 44.42449011619408
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pose estimation of 3D objects in monocular images is a fundamental and
long-standing problem in computer vision. Existing deep learning approaches for
6D pose estimation typically rely on the assumption of availability of 3D
object models and 6D pose annotations. However, precise annotation of 6D poses
in real data is intricate, time-consuming and not scalable, while synthetic
data scales well but lacks realism. To avoid these problems, we present a
weakly-supervised reconstruction-based pipeline, named NeRF-Pose, which needs
only 2D object segmentation and known relative camera poses during training.
Following the first-reconstruct-then-regress idea, we first reconstruct the
objects from multiple views in the form of an implicit neural representation.
Then, we train a pose regression network to predict pixel-wise 2D-3D
correspondences between images and the reconstructed model. At inference, the
approach only needs a single image as input. A NeRF-enabled PnP+RANSAC
algorithm is used to estimate stable and accurate pose from the predicted
correspondences. Experiments on LineMod and LineMod-Occlusion show that the
proposed method has state-of-the-art accuracy in comparison to the best 6D pose
estimation methods in spite of being trained only with weak labels. Besides, we
extend the Homebrewed DB dataset with more real training images to support the
weakly supervised task and achieve compelling results on this dataset. The
extended dataset and code will be released soon.
Related papers
- Learning to Estimate 6DoF Pose from Limited Data: A Few-Shot,
Generalizable Approach using RGB Images [60.0898989456276]
We present a new framework named Cas6D for few-shot 6DoF pose estimation that is generalizable and uses only RGB images.
To address the false positives of target object detection in the extreme few-shot setting, our framework utilizes a self-supervised pre-trained ViT to learn robust feature representations.
Experimental results on the LINEMOD and GenMOP datasets demonstrate that Cas6D outperforms state-of-the-art methods by 9.2% and 3.8% accuracy (Proj-5) under the 32-shot setting.
arXiv Detail & Related papers (2023-06-13T07:45:42Z) - Towards Accurate Reconstruction of 3D Scene Shape from A Single
Monocular Image [91.71077190961688]
We propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image.
We then exploits 3D point cloud data to predict the depth shift and the camera's focal length that allow us to recover 3D scene shapes.
We test our depth model on nine unseen datasets and achieve state-of-the-art performance on zero-shot evaluation.
arXiv Detail & Related papers (2022-08-28T16:20:14Z) - Unseen Object 6D Pose Estimation: A Benchmark and Baselines [62.8809734237213]
We propose a new task that enables and facilitates algorithms to estimate the 6D pose estimation of novel objects during testing.
We collect a dataset with both real and synthetic images and up to 48 unseen objects in the test set.
By training an end-to-end 3D correspondences network, our method finds corresponding points between an unseen object and a partial view RGBD image accurately and efficiently.
arXiv Detail & Related papers (2022-06-23T16:29:53Z) - FvOR: Robust Joint Shape and Pose Optimization for Few-view Object
Reconstruction [37.81077373162092]
Reconstructing an accurate 3D object model from a few image observations remains a challenging problem in computer vision.
We present FvOR, a learning-based object reconstruction method that predicts accurate 3D models given a few images with noisy input poses.
arXiv Detail & Related papers (2022-05-16T15:39:27Z) - Coupled Iterative Refinement for 6D Multi-Object Pose Estimation [64.7198752089041]
Given a set of known 3D objects and an RGB or RGB-D input image, we detect and estimate the 6D pose of each object.
Our approach iteratively refines both pose and correspondence in a tightly coupled manner, allowing us to dynamically remove outliers to improve accuracy.
arXiv Detail & Related papers (2022-04-26T18:00:08Z) - OSOP: A Multi-Stage One Shot Object Pose Estimation Framework [35.89334617258322]
We present a novel one-shot method for object detection and 6 DoF pose estimation, that does not require training on target objects.
At test time, it takes as input a target image and a textured 3D query model.
We evaluate the method on LineMOD, Occlusion, Homebrewed, YCB-V and TLESS datasets.
arXiv Detail & Related papers (2022-03-29T13:12:00Z) - SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation [98.83762558394345]
SO-Pose is a framework for regressing all 6 degrees-of-freedom (6DoF) for the object pose in a cluttered environment from a single RGB image.
We introduce a novel reasoning about self-occlusion, in order to establish a two-layer representation for 3D objects.
Cross-layer consistencies that align correspondences, self-occlusion and 6D pose, we can further improve accuracy and robustness.
arXiv Detail & Related papers (2021-08-18T19:49:29Z) - Single Shot 6D Object Pose Estimation [11.37625512264302]
We introduce a novel single shot approach for 6D object pose estimation of rigid objects based on depth images.
A fully convolutional neural network is employed, where the 3D input data is spatially discretized and pose estimation is considered as a regression task.
With 65 fps on a GPU, our Object Pose Network (OP-Net) is extremely fast, is optimized end-to-end, and estimates the 6D pose of multiple objects in the image simultaneously.
arXiv Detail & Related papers (2020-04-27T11:59:11Z) - Neural Mesh Refiner for 6-DoF Pose Estimation [10.62836310872743]
Deep learning has shown to be effective for robust and real-time monocular pose estimation.
This paper bridges the gap between 2D mask generation and 3D location prediction via a differentiable neural mesh.
arXiv Detail & Related papers (2020-03-17T07:12:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.