T6D-Direct: Transformers for Multi-Object 6D Pose Direct Regression
- URL: http://arxiv.org/abs/2109.10948v1
- Date: Wed, 22 Sep 2021 18:13:33 GMT
- Title: T6D-Direct: Transformers for Multi-Object 6D Pose Direct Regression
- Authors: Arash Amini, Arul Selvam Periyasamy, and Sven Behnke
- Abstract summary: T6D-Direct is a real-time single-stage direct method with a transformer-based architecture built on DETR to perform 6D multi-object pose direct estimation.
Our method achieves the fastest inference time, and the pose estimation accuracy is comparable to state-of-the-art methods.
- Score: 40.90172673391803
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: 6D pose estimation is the task of predicting the translation and orientation
of objects in a given input image, which is a crucial prerequisite for many
robotics and augmented reality applications. Lately, the Transformer Network
architecture, equipped with a multi-head self-attention mechanism, is emerging
to achieve state-of-the-art results in many computer vision tasks. DETR, a
Transformer-based model, formulated object detection as a set prediction
problem and achieved impressive results without standard components like region
of interest pooling, non-maximal suppression, and bounding box proposals. In
this work, we propose T6D-Direct, a real-time single-stage direct method with a
transformer-based architecture built on DETR to perform 6D multi-object pose
direct estimation. We evaluate the performance of our method on the YCB-Video
dataset. Our method achieves the fastest inference time, and the pose
estimation accuracy is comparable to state-of-the-art methods.
Related papers
- PViT-6D: Overclocking Vision Transformers for 6D Pose Estimation with
Confidence-Level Prediction and Pose Tokens [0.0]
We explore the capabilities of Vision Transformers for direct 6D pose estimation through a tailored use of classification tokens.
We also introduce a simple method for determining pose confidence, which can be readily integrated into most 6D pose estimation frameworks.
arXiv Detail & Related papers (2023-11-29T10:27:56Z) - YOLOPose V2: Understanding and Improving Transformer-based 6D Pose
Estimation [36.067414358144816]
YOLOPose is a Transformer-based multi-object 6D pose estimation method.
We employ a learnable orientation estimation module to predict the orientation from the keypoints.
Our method is suitable for real-time applications and achieves results comparable to state-of-the-art methods.
arXiv Detail & Related papers (2023-07-21T12:53:54Z) - TransPose: A Transformer-based 6D Object Pose Estimation Network with
Depth Refinement [5.482532589225552]
We propose TransPose, an improved Transformer-based 6D pose estimation with a depth refinement module.
The architecture takes in only an RGB image as input with no additional supplementing modalities such as depth or thermal images.
A novel depth refinement module is then used alongside the predicted centers, 6D poses and depth patches to refine the accuracy of the estimated 6D pose.
arXiv Detail & Related papers (2023-07-09T17:33:13Z) - DETR4D: Direct Multi-View 3D Object Detection with Sparse Attention [50.11672196146829]
3D object detection with surround-view images is an essential task for autonomous driving.
We propose DETR4D, a Transformer-based framework that explores sparse attention and direct feature query for 3D object detection in multi-view images.
arXiv Detail & Related papers (2022-12-15T14:18:47Z) - CRT-6D: Fast 6D Object Pose Estimation with Cascaded Refinement
Transformers [51.142988196855484]
This paper introduces a novel method we call Cascaded Refinement Transformers, or CRT-6D.
We replace the commonly used dense intermediate representation with a sparse set of features sampled from the feature pyramid we call Os(Object Keypoint Features) where each element corresponds to an object keypoint.
We achieve inferences 2x faster than the closest real-time state of the art methods while supporting up to 21 objects on a single model.
arXiv Detail & Related papers (2022-10-21T04:06:52Z) - YOLOPose: Transformer-based Multi-Object 6D Pose Estimation using
Keypoint Regression [44.282841879849244]
We propose YOLOPose, a Transformer-based multi-object 6D pose estimation method based on keypoint regression.
Our method is suitable for real-time applications and achieves results comparable to state-of-the-art methods.
arXiv Detail & Related papers (2022-05-05T09:51:39Z) - FS6D: Few-Shot 6D Pose Estimation of Novel Objects [116.34922994123973]
6D object pose estimation networks are limited in their capability to scale to large numbers of object instances.
In this work, we study a new open set problem; the few-shot 6D object poses estimation: estimating the 6D pose of an unknown object by a few support views without extra training.
arXiv Detail & Related papers (2022-03-28T10:31:29Z) - Spatial Attention Improves Iterative 6D Object Pose Estimation [52.365075652976735]
We propose a new method for 6D pose estimation refinement from RGB images.
Our main insight is that after the initial pose estimate, it is important to pay attention to distinct spatial features of the object.
We experimentally show that this approach learns to attend to salient spatial features and learns to ignore occluded parts of the object, leading to better pose estimation across datasets.
arXiv Detail & Related papers (2021-01-05T17:18:52Z) - PrimA6D: Rotational Primitive Reconstruction for Enhanced and Robust 6D
Pose Estimation [11.873744190924599]
We introduce a rotational primitive prediction based 6D object pose estimation using a single image as an input.
We leverage a Variational AutoEncoder (VAE) to learn this underlying primitive and its associated keypoints.
When evaluated over public datasets, our method yields a notable improvement over LINEMOD, Occlusion LINEMOD, and the Y-induced dataset.
arXiv Detail & Related papers (2020-06-14T03:55:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.