Depth-based 6DoF Object Pose Estimation using Swin Transformer
- URL: http://arxiv.org/abs/2303.02133v2
- Date: Thu, 27 Apr 2023 18:07:40 GMT
- Title: Depth-based 6DoF Object Pose Estimation using Swin Transformer
- Authors: Zhujun Li and Ioannis Stamos
- Abstract summary: Accurately estimating the 6D pose of objects is crucial for many applications, such as robotic grasping, autonomous driving, and augmented reality.
We propose a novel framework called SwinDePose, that uses only geometric information from depth images to achieve accurate 6D pose estimation.
In experiments on the LineMod and Occlusion LineMod datasets, SwinDePose outperforms existing state-of-the-art methods for 6D object pose estimation using depth images.
- Score: 1.14219428942199
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Accurately estimating the 6D pose of objects is crucial for many
applications, such as robotic grasping, autonomous driving, and augmented
reality. However, this task becomes more challenging in poor lighting
conditions or when dealing with textureless objects. To address this issue,
depth images are becoming an increasingly popular choice due to their
invariance to a scene's appearance and the implicit incorporation of essential
geometric characteristics. However, fully leveraging depth information to
improve the performance of pose estimation remains a difficult and
under-investigated problem. To tackle this challenge, we propose a novel
framework called SwinDePose, that uses only geometric information from depth
images to achieve accurate 6D pose estimation. SwinDePose first calculates the
angles between each normal vector defined in a depth image and the three
coordinate axes in the camera coordinate system. The resulting angles are then
formed into an image, which is encoded using Swin Transformer. Additionally, we
apply RandLA-Net to learn the representations from point clouds. The resulting
image and point clouds embeddings are concatenated and fed into a semantic
segmentation module and a 3D keypoints localization module. Finally, we
estimate 6D poses using a least-square fitting approach based on the target
object's predicted semantic mask and 3D keypoints. In experiments on the
LineMod and Occlusion LineMod datasets, SwinDePose outperforms existing
state-of-the-art methods for 6D object pose estimation using depth images. This
demonstrates the effectiveness of our approach and highlights its potential for
improving performance in real-world scenarios. Our code is at
https://github.com/zhujunli1993/SwinDePose.
Related papers
- Pseudo Flow Consistency for Self-Supervised 6D Object Pose Estimation [14.469317161361202]
We propose a 6D object pose estimation method that can be trained with pure RGB images without any auxiliary information.
We evaluate our method on three challenging datasets and demonstrate that it outperforms state-of-the-art self-supervised methods significantly.
arXiv Detail & Related papers (2023-08-19T13:52:18Z) - Neural Correspondence Field for Object Pose Estimation [67.96767010122633]
We propose a method for estimating the 6DoF pose of a rigid object with an available 3D model from a single RGB image.
Unlike classical correspondence-based methods which predict 3D object coordinates at pixels of the input image, the proposed method predicts 3D object coordinates at 3D query points sampled in the camera frustum.
arXiv Detail & Related papers (2022-07-30T01:48:23Z) - Unseen Object 6D Pose Estimation: A Benchmark and Baselines [62.8809734237213]
We propose a new task that enables and facilitates algorithms to estimate the 6D pose estimation of novel objects during testing.
We collect a dataset with both real and synthetic images and up to 48 unseen objects in the test set.
By training an end-to-end 3D correspondences network, our method finds corresponding points between an unseen object and a partial view RGBD image accurately and efficiently.
arXiv Detail & Related papers (2022-06-23T16:29:53Z) - Coupled Iterative Refinement for 6D Multi-Object Pose Estimation [64.7198752089041]
Given a set of known 3D objects and an RGB or RGB-D input image, we detect and estimate the 6D pose of each object.
Our approach iteratively refines both pose and correspondence in a tightly coupled manner, allowing us to dynamically remove outliers to improve accuracy.
arXiv Detail & Related papers (2022-04-26T18:00:08Z) - ZebraPose: Coarse to Fine Surface Encoding for 6DoF Object Pose
Estimation [76.31125154523056]
We present a discrete descriptor, which can represent the object surface densely.
We also propose a coarse to fine training strategy, which enables fine-grained correspondence prediction.
arXiv Detail & Related papers (2022-03-17T16:16:24Z) - NeRF-Pose: A First-Reconstruct-Then-Regress Approach for
Weakly-supervised 6D Object Pose Estimation [44.42449011619408]
We present a weakly-supervised reconstruction-based pipeline, named NeRF-Pose, which needs only 2D object segmentation and known relative camera poses during training.
A NeRF-enabled RAN+SAC algorithm is used to estimate stable and accurate pose from the predicted correspondences.
Experiments on LineMod-Occlusion show that the proposed method has state-of-the-art accuracy in comparison to the best 6D pose estimation methods.
arXiv Detail & Related papers (2022-03-09T15:28:02Z) - Weakly Supervised Learning of Keypoints for 6D Object Pose Estimation [73.40404343241782]
We propose a weakly supervised 6D object pose estimation approach based on 2D keypoint detection.
Our approach achieves comparable performance with state-of-the-art fully supervised approaches.
arXiv Detail & Related papers (2022-03-07T16:23:47Z) - CAPTRA: CAtegory-level Pose Tracking for Rigid and Articulated Objects
from Point Clouds [97.63549045541296]
We propose a unified framework that can handle 9DoF pose tracking for novel rigid object instances and per-part pose tracking for articulated objects.
Our method achieves new state-of-the-art performance on category-level rigid object pose (NOCS-REAL275) and articulated object pose benchmarks (SAPIEN, BMVC) at the fastest FPS 12.
arXiv Detail & Related papers (2021-04-08T00:14:58Z) - Single Shot 6D Object Pose Estimation [11.37625512264302]
We introduce a novel single shot approach for 6D object pose estimation of rigid objects based on depth images.
A fully convolutional neural network is employed, where the 3D input data is spatially discretized and pose estimation is considered as a regression task.
With 65 fps on a GPU, our Object Pose Network (OP-Net) is extremely fast, is optimized end-to-end, and estimates the 6D pose of multiple objects in the image simultaneously.
arXiv Detail & Related papers (2020-04-27T11:59:11Z) - L6DNet: Light 6 DoF Network for Robust and Precise Object Pose
Estimation with Small Datasets [0.0]
We propose a novel approach to perform 6 DoF object pose estimation from a single RGB-D image.
We adopt a hybrid pipeline in two stages: data-driven and geometric.
Our approach is more robust and accurate than state-of-the-art methods.
arXiv Detail & Related papers (2020-02-03T17:41:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.