Related papers: YOLOPose V2: Understanding and Improving Transformer-based 6D Pose Estimation

YOLOPose V2: Understanding and Improving Transformer-based 6D Pose Estimation

URL: http://arxiv.org/abs/2307.11550v1
Date: Fri, 21 Jul 2023 12:53:54 GMT
Title: YOLOPose V2: Understanding and Improving Transformer-based 6D Pose Estimation
Authors: Arul Selvam Periyasamy, Arash Amini, Vladimir Tsaturyan, and Sven Behnke
Abstract summary: YOLOPose is a Transformer-based multi-object 6D pose estimation method. We employ a learnable orientation estimation module to predict the orientation from the keypoints. Our method is suitable for real-time applications and achieves results comparable to state-of-the-art methods.
Score: 36.067414358144816
License: http://creativecommons.org/licenses/by/4.0/
Abstract: 6D object pose estimation is a crucial prerequisite for autonomous robot manipulation applications. The state-of-the-art models for pose estimation are convolutional neural network (CNN)-based. Lately, Transformers, an architecture originally proposed for natural language processing, is achieving state-of-the-art results in many computer vision tasks as well. Equipped with the multi-head self-attention mechanism, Transformers enable simple single-stage end-to-end architectures for learning object detection and 6D object pose estimation jointly. In this work, we propose YOLOPose (short form for You Only Look Once Pose estimation), a Transformer-based multi-object 6D pose estimation method based on keypoint regression and an improved variant of the YOLOPose model. In contrast to the standard heatmaps for predicting keypoints in an image, we directly regress the keypoints. Additionally, we employ a learnable orientation estimation module to predict the orientation from the keypoints. Along with a separate translation estimation module, our model is end-to-end differentiable. Our method is suitable for real-time applications and achieves results comparable to state-of-the-art methods. We analyze the role of object queries in our architecture and reveal that the object queries specialize in detecting objects in specific image regions. Furthermore, we quantify the accuracy trade-off of using datasets of smaller sizes to train our model.

Related papers

One2Any: One-Reference 6D Pose Estimation for Any Object [98.50085481362808]
6D object pose estimation remains challenging for many applications due to dependencies on complete 3D models, multi-view images, or training limited to specific object categories.<n>We propose a novel method One2Any that estimates the relative 6-degrees of freedom (DOF) object pose using only a single reference-single query RGB-D image.<n> Experiments on multiple benchmark datasets demonstrate that our model generalizes well to novel objects, achieving state-of-the-art accuracy and even rivaling methods that require multi-view or CAD inputs, at a fraction of compute.
arXiv Detail & Related papers (2025-05-07T03:54:59Z)
FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects [55.77542145604758]
FoundationPose is a unified foundation model for 6D object pose estimation and tracking. Our approach can be instantly applied at test-time to a novel object without fine-tuning.
arXiv Detail & Related papers (2023-12-13T18:28:09Z)
PoseMatcher: One-shot 6D Object Pose Estimation by Deep Feature Matching [51.142988196855484]
We propose PoseMatcher, an accurate model free one-shot object pose estimator. We create a new training pipeline for object to image matching based on a three-view system. To enable PoseMatcher to attend to distinct input modalities, an image and a pointcloud, we introduce IO-Layer.
arXiv Detail & Related papers (2023-04-03T21:14:59Z)
MegaPose: 6D Pose Estimation of Novel Objects via Render & Compare [84.80956484848505]
MegaPose is a method to estimate the 6D pose of novel objects, that is, objects unseen during training. We present a 6D pose refiner based on a render&compare strategy which can be applied to novel objects. Second, we introduce a novel approach for coarse pose estimation which leverages a network trained to classify whether the pose error between a synthetic rendering and an observed image of the same object can be corrected by the refiner.
arXiv Detail & Related papers (2022-12-13T19:30:03Z)
Unseen Object 6D Pose Estimation: A Benchmark and Baselines [62.8809734237213]
We propose a new task that enables and facilitates algorithms to estimate the 6D pose estimation of novel objects during testing. We collect a dataset with both real and synthetic images and up to 48 unseen objects in the test set. By training an end-to-end 3D correspondences network, our method finds corresponding points between an unseen object and a partial view RGBD image accurately and efficiently.
arXiv Detail & Related papers (2022-06-23T16:29:53Z)
YOLOPose: Transformer-based Multi-Object 6D Pose Estimation using Keypoint Regression [44.282841879849244]
We propose YOLOPose, a Transformer-based multi-object 6D pose estimation method based on keypoint regression. Our method is suitable for real-time applications and achieves results comparable to state-of-the-art methods.
arXiv Detail & Related papers (2022-05-05T09:51:39Z)
T6D-Direct: Transformers for Multi-Object 6D Pose Direct Regression [40.90172673391803]
T6D-Direct is a real-time single-stage direct method with a transformer-based architecture built on DETR to perform 6D multi-object pose direct estimation. Our method achieves the fastest inference time, and the pose estimation accuracy is comparable to state-of-the-art methods.
arXiv Detail & Related papers (2021-09-22T18:13:33Z)
6D Object Pose Estimation using Keypoints and Part Affinity Fields [24.126513851779936]
The task of 6D object pose estimation from RGB images is an important requirement for autonomous service robots to be able to interact with the real world. We present a two-step pipeline for estimating the 6 DoF translation and orientation of known objects.
arXiv Detail & Related papers (2021-07-05T14:41:19Z)
Spatial Attention Improves Iterative 6D Object Pose Estimation [52.365075652976735]
We propose a new method for 6D pose estimation refinement from RGB images. Our main insight is that after the initial pose estimate, it is important to pay attention to distinct spatial features of the object. We experimentally show that this approach learns to attend to salient spatial features and learns to ignore occluded parts of the object, leading to better pose estimation across datasets.
arXiv Detail & Related papers (2021-01-05T17:18:52Z)
PrimA6D: Rotational Primitive Reconstruction for Enhanced and Robust 6D Pose Estimation [11.873744190924599]
We introduce a rotational primitive prediction based 6D object pose estimation using a single image as an input. We leverage a Variational AutoEncoder (VAE) to learn this underlying primitive and its associated keypoints. When evaluated over public datasets, our method yields a notable improvement over LINEMOD, Occlusion LINEMOD, and the Y-induced dataset.
arXiv Detail & Related papers (2020-06-14T03:55:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.