TransPose: A Transformer-based 6D Object Pose Estimation Network with
Depth Refinement
- URL: http://arxiv.org/abs/2307.05561v1
- Date: Sun, 9 Jul 2023 17:33:13 GMT
- Title: TransPose: A Transformer-based 6D Object Pose Estimation Network with
Depth Refinement
- Authors: Mahmoud Abdulsalam and Nabil Aouf
- Abstract summary: We propose TransPose, an improved Transformer-based 6D pose estimation with a depth refinement module.
The architecture takes in only an RGB image as input with no additional supplementing modalities such as depth or thermal images.
A novel depth refinement module is then used alongside the predicted centers, 6D poses and depth patches to refine the accuracy of the estimated 6D pose.
- Score: 5.482532589225552
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As demand for robotics manipulation application increases, accurate
vision-based 6D pose estimation becomes essential for autonomous operations.
Convolutional Neural Networks (CNNs) based approaches for pose estimation have
been previously introduced. However, the quest for better performance still
persists especially for accurate robotics manipulation. This quest extends to
the Agri-robotics domain. In this paper, we propose TransPose, an improved
Transformer-based 6D pose estimation with a depth refinement module. The
architecture takes in only an RGB image as input with no additional
supplementing modalities such as depth or thermal images. The architecture
encompasses an innovative lighter depth estimation network that estimates depth
from an RGB image using feature pyramid with an up-sampling method. A
transformer-based detection network with additional prediction heads is
proposed to directly regress the object's centre and predict the 6D pose of the
target. A novel depth refinement module is then used alongside the predicted
centers, 6D poses and depth patches to refine the accuracy of the estimated 6D
pose. We extensively compared our results with other state-of-the-art methods
and analysed our results for fruit-picking applications. The results we
achieved show that our proposed technique outperforms the other methods
available in the literature.
Related papers
- RDPN6D: Residual-based Dense Point-wise Network for 6Dof Object Pose Estimation Based on RGB-D Images [13.051302134031808]
We introduce a novel method for calculating the 6DoF pose of an object using a single RGB-D image.
Unlike existing methods that either directly predict objects' poses or rely on sparse keypoints for pose recovery, our approach addresses this challenging task using dense correspondence.
arXiv Detail & Related papers (2024-05-14T10:10:45Z) - YOLOPose V2: Understanding and Improving Transformer-based 6D Pose
Estimation [36.067414358144816]
YOLOPose is a Transformer-based multi-object 6D pose estimation method.
We employ a learnable orientation estimation module to predict the orientation from the keypoints.
Our method is suitable for real-time applications and achieves results comparable to state-of-the-art methods.
arXiv Detail & Related papers (2023-07-21T12:53:54Z) - PoET: Pose Estimation Transformer for Single-View, Multi-Object 6D Pose
Estimation [6.860183454947986]
We present a transformer-based approach that takes an RGB image as input and predicts a 6D pose for each object in the image.
Besides the image, our network does not require any additional information such as depth maps or 3D object models.
We achieve state-of-the-art results for RGB-only approaches on the challenging YCB-V dataset.
arXiv Detail & Related papers (2022-11-25T14:07:14Z) - Unseen Object 6D Pose Estimation: A Benchmark and Baselines [62.8809734237213]
We propose a new task that enables and facilitates algorithms to estimate the 6D pose estimation of novel objects during testing.
We collect a dataset with both real and synthetic images and up to 48 unseen objects in the test set.
By training an end-to-end 3D correspondences network, our method finds corresponding points between an unseen object and a partial view RGBD image accurately and efficiently.
arXiv Detail & Related papers (2022-06-23T16:29:53Z) - DeepRM: Deep Recurrent Matching for 6D Pose Refinement [77.34726150561087]
DeepRM is a novel recurrent network architecture for 6D pose refinement.
The architecture incorporates LSTM units to propagate information through each refinement step.
DeepRM achieves state-of-the-art performance on two widely accepted challenging datasets.
arXiv Detail & Related papers (2022-05-28T16:18:08Z) - Coupled Iterative Refinement for 6D Multi-Object Pose Estimation [64.7198752089041]
Given a set of known 3D objects and an RGB or RGB-D input image, we detect and estimate the 6D pose of each object.
Our approach iteratively refines both pose and correspondence in a tightly coupled manner, allowing us to dynamically remove outliers to improve accuracy.
arXiv Detail & Related papers (2022-04-26T18:00:08Z) - FS6D: Few-Shot 6D Pose Estimation of Novel Objects [116.34922994123973]
6D object pose estimation networks are limited in their capability to scale to large numbers of object instances.
In this work, we study a new open set problem; the few-shot 6D object poses estimation: estimating the 6D pose of an unknown object by a few support views without extra training.
arXiv Detail & Related papers (2022-03-28T10:31:29Z) - T6D-Direct: Transformers for Multi-Object 6D Pose Direct Regression [40.90172673391803]
T6D-Direct is a real-time single-stage direct method with a transformer-based architecture built on DETR to perform 6D multi-object pose direct estimation.
Our method achieves the fastest inference time, and the pose estimation accuracy is comparable to state-of-the-art methods.
arXiv Detail & Related papers (2021-09-22T18:13:33Z) - Spatial Attention Improves Iterative 6D Object Pose Estimation [52.365075652976735]
We propose a new method for 6D pose estimation refinement from RGB images.
Our main insight is that after the initial pose estimate, it is important to pay attention to distinct spatial features of the object.
We experimentally show that this approach learns to attend to salient spatial features and learns to ignore occluded parts of the object, leading to better pose estimation across datasets.
arXiv Detail & Related papers (2021-01-05T17:18:52Z) - PrimA6D: Rotational Primitive Reconstruction for Enhanced and Robust 6D
Pose Estimation [11.873744190924599]
We introduce a rotational primitive prediction based 6D object pose estimation using a single image as an input.
We leverage a Variational AutoEncoder (VAE) to learn this underlying primitive and its associated keypoints.
When evaluated over public datasets, our method yields a notable improvement over LINEMOD, Occlusion LINEMOD, and the Y-induced dataset.
arXiv Detail & Related papers (2020-06-14T03:55:42Z) - Self6D: Self-Supervised Monocular 6D Object Pose Estimation [114.18496727590481]
We propose the idea of monocular 6D pose estimation by means of self-supervised learning.
We leverage recent advances in neural rendering to further self-supervise the model on unannotated real RGB-D data.
arXiv Detail & Related papers (2020-04-14T13:16:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.