DSC-PoseNet: Learning 6DoF Object Pose Estimation via Dual-scale
Consistency
- URL: http://arxiv.org/abs/2104.03658v1
- Date: Thu, 8 Apr 2021 10:19:35 GMT
- Title: DSC-PoseNet: Learning 6DoF Object Pose Estimation via Dual-scale
Consistency
- Authors: Zongxin Yang, Xin Yu, Yi Yang
- Abstract summary: We present a two-step pose estimation framework to attain 6DoF object poses from 2D object bounding-boxes.
In the first step, the framework learns to segment objects from real and synthetic data.
In the second step, we design a dual-scale pose estimation network, namely DSC-PoseNet.
Our method outperforms state-of-the-art models trained on synthetic data by a large margin.
- Score: 43.09728251735362
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Compared to 2D object bounding-box labeling, it is very difficult for humans
to annotate 3D object poses, especially when depth images of scenes are
unavailable. This paper investigates whether we can estimate the object poses
effectively when only RGB images and 2D object annotations are given. To this
end, we present a two-step pose estimation framework to attain 6DoF object
poses from 2D object bounding-boxes. In the first step, the framework learns to
segment objects from real and synthetic data in a weakly-supervised fashion,
and the segmentation masks will act as a prior for pose estimation. In the
second step, we design a dual-scale pose estimation network, namely
DSC-PoseNet, to predict object poses by employing a differential renderer. To
be specific, our DSC-PoseNet firstly predicts object poses in the original
image scale by comparing the segmentation masks and the rendered visible object
masks. Then, we resize object regions to a fixed scale to estimate poses once
again. In this fashion, we eliminate large scale variations and focus on
rotation estimation, thus facilitating pose estimation. Moreover, we exploit
the initial pose estimation to generate pseudo ground-truth to train our
DSC-PoseNet in a self-supervised manner. The estimation results in these two
scales are ensembled as our final pose estimation. Extensive experiments on
widely-used benchmarks demonstrate that our method outperforms state-of-the-art
models trained on synthetic data by a large margin and even is on par with
several fully-supervised methods.
Related papers
- DVMNet: Computing Relative Pose for Unseen Objects Beyond Hypotheses [59.51874686414509]
Current approaches approximate the continuous pose representation with a large number of discrete pose hypotheses.
We present a Deep Voxel Matching Network (DVMNet) that eliminates the need for pose hypotheses and computes the relative object pose in a single pass.
Our method delivers more accurate relative pose estimates for novel objects at a lower computational cost compared to state-of-the-art methods.
arXiv Detail & Related papers (2024-03-20T15:41:32Z) - Extreme Two-View Geometry From Object Poses with Diffusion Models [21.16779160086591]
We harness the power of object priors to accurately determine two-view geometry in the face of extreme viewpoint changes.
In experiments, our method has demonstrated extraordinary robustness and resilience to large viewpoint changes.
arXiv Detail & Related papers (2024-02-05T08:18:47Z) - LocaliseBot: Multi-view 3D object localisation with differentiable
rendering for robot grasping [9.690844449175948]
We focus on object pose estimation.
Our approach relies on three pieces of information: multiple views of the object, the camera's parameters at those viewpoints, and 3D CAD models of objects.
We show that the estimated object pose results in 99.65% grasp accuracy with the ground truth grasp candidates.
arXiv Detail & Related papers (2023-11-14T14:27:53Z) - ZS6D: Zero-shot 6D Object Pose Estimation using Vision Transformers [9.899633398596672]
We introduce ZS6D, for zero-shot novel object 6D pose estimation.
Visual descriptors, extracted using pre-trained Vision Transformers (ViT), are used for matching rendered templates.
Experiments are performed on LMO, YCBV, and TLESS datasets.
arXiv Detail & Related papers (2023-09-21T11:53:01Z) - Rigidity-Aware Detection for 6D Object Pose Estimation [60.88857851869196]
Most recent 6D object pose estimation methods first use object detection to obtain 2D bounding boxes before actually regressing the pose.
We propose a rigidity-aware detection method exploiting the fact that, in 6D pose estimation, the target objects are rigid.
Key to the success of our approach is a visibility map, which we propose to build using a minimum barrier distance between every pixel in the bounding box and the box boundary.
arXiv Detail & Related papers (2023-03-22T09:02:54Z) - Unseen Object 6D Pose Estimation: A Benchmark and Baselines [62.8809734237213]
We propose a new task that enables and facilitates algorithms to estimate the 6D pose estimation of novel objects during testing.
We collect a dataset with both real and synthetic images and up to 48 unseen objects in the test set.
By training an end-to-end 3D correspondences network, our method finds corresponding points between an unseen object and a partial view RGBD image accurately and efficiently.
arXiv Detail & Related papers (2022-06-23T16:29:53Z) - ObPose: Leveraging Pose for Object-Centric Scene Inference and
Generation in 3D [21.700203922407496]
ObPose is an unsupervised object-centric inference and generation model.
It learns 3D-structured latent representations from RGB-D scenes.
ObPose is evaluated quantitatively on the YCB, MultiShapeNet, and CLEVR datatasets.
arXiv Detail & Related papers (2022-06-07T21:15:18Z) - ZebraPose: Coarse to Fine Surface Encoding for 6DoF Object Pose
Estimation [76.31125154523056]
We present a discrete descriptor, which can represent the object surface densely.
We also propose a coarse to fine training strategy, which enables fine-grained correspondence prediction.
arXiv Detail & Related papers (2022-03-17T16:16:24Z) - Sparse Pose Trajectory Completion [87.31270669154452]
We propose a method to learn, even using a dataset where objects appear only in sparsely sampled views.
This is achieved with a cross-modal pose trajectory transfer mechanism.
Our method is evaluated on the Pix3D and ShapeNet datasets.
arXiv Detail & Related papers (2021-05-01T00:07:21Z) - CosyPose: Consistent multi-view multi-object 6D pose estimation [48.097599674329004]
We present a single-view single-object 6D pose estimation method, which we use to generate 6D object pose hypotheses.
Second, we develop a robust method for matching individual 6D object pose hypotheses across different input images.
Third, we develop a method for global scene refinement given multiple object hypotheses and their correspondences across views.
arXiv Detail & Related papers (2020-08-19T14:11:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.