Related papers: CRT-6D: Fast 6D Object Pose Estimation with Cascaded Refinement Transformers

CRT-6D: Fast 6D Object Pose Estimation with Cascaded Refinement Transformers

URL: http://arxiv.org/abs/2210.11718v1
Date: Fri, 21 Oct 2022 04:06:52 GMT
Title: CRT-6D: Fast 6D Object Pose Estimation with Cascaded Refinement Transformers
Authors: Pedro Castro and Tae-Kyun Kim
Abstract summary: This paper introduces a novel method we call Cascaded Refinement Transformers, or CRT-6D. We replace the commonly used dense intermediate representation with a sparse set of features sampled from the feature pyramid we call Os(Object Keypoint Features) where each element corresponds to an object keypoint. We achieve inferences 2x faster than the closest real-time state of the art methods while supporting up to 21 objects on a single model.
Score: 51.142988196855484
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Learning based 6D object pose estimation methods rely on computing large intermediate pose representations and/or iteratively refining an initial estimation with a slow render-compare pipeline. This paper introduces a novel method we call Cascaded Pose Refinement Transformers, or CRT-6D. We replace the commonly used dense intermediate representation with a sparse set of features sampled from the feature pyramid we call OSKFs(Object Surface Keypoint Features) where each element corresponds to an object keypoint. We employ lightweight deformable transformers and chain them together to iteratively refine proposed poses over the sampled OSKFs. We achieve inference runtimes 2x faster than the closest real-time state of the art methods while supporting up to 21 objects on a single model. We demonstrate the effectiveness of CRT-6D by performing extensive experiments on the LM-O and YCBV datasets. Compared to real-time methods, we achieve state of the art on LM-O and YCB-V, falling slightly behind methods with inference runtimes one order of magnitude higher. The source code is available at: https://github.com/PedroCastro/CRT-6D

Related papers

ReFlow6D: Refraction-Guided Transparent Object 6D Pose Estimation via Intermediate Representation Learning [48.29147383536012]
We present ReFlow6D, a novel method for transparent object 6D pose estimation. Unlike conventional approaches, our method leverages a feature space impervious to changes in RGB image space and independent of depth information. We show that ReFlow6D achieves precise 6D pose estimation of transparent objects, using only RGB images as input.
arXiv Detail & Related papers (2024-12-30T09:53:26Z)
RDPN6D: Residual-based Dense Point-wise Network for 6Dof Object Pose Estimation Based on RGB-D Images [13.051302134031808]
We introduce a novel method for calculating the 6DoF pose of an object using a single RGB-D image. Unlike existing methods that either directly predict objects' poses or rely on sparse keypoints for pose recovery, our approach addresses this challenging task using dense correspondence.
arXiv Detail & Related papers (2024-05-14T10:10:45Z)
RGB-based Category-level Object Pose Estimation via Decoupled Metric Scale Recovery [72.13154206106259]
We propose a novel pipeline that decouples the 6D pose and size estimation to mitigate the influence of imperfect scales on rigid transformations. Specifically, we leverage a pre-trained monocular estimator to extract local geometric information. A separate branch is designed to directly recover the metric scale of the object based on category-level statistics.
arXiv Detail & Related papers (2023-09-19T02:20:26Z)
Unseen Object 6D Pose Estimation: A Benchmark and Baselines [62.8809734237213]
We propose a new task that enables and facilitates algorithms to estimate the 6D pose estimation of novel objects during testing. We collect a dataset with both real and synthetic images and up to 48 unseen objects in the test set. By training an end-to-end 3D correspondences network, our method finds corresponding points between an unseen object and a partial view RGBD image accurately and efficiently.
arXiv Detail & Related papers (2022-06-23T16:29:53Z)
Semantic keypoint-based pose estimation from single RGB frames [64.80395521735463]
We present an approach to estimating the continuous 6-DoF pose of an object from a single RGB image. The approach combines semantic keypoints predicted by a convolutional network (convnet) with a deformable shape model. We show that our approach can accurately recover the 6-DoF object pose for both instance- and class-based scenarios.
arXiv Detail & Related papers (2022-04-12T15:03:51Z)
MPF6D: Masked Pyramid Fusion 6D Pose Estimation [1.2891210250935146]
We present a new method to estimate the 6D pose of objects that improves upon the accuracy of current proposals. Our method can be used in real-time with its low inference time of 0.12 seconds and has high accuracy.
arXiv Detail & Related papers (2021-11-17T20:23:54Z)
T6D-Direct: Transformers for Multi-Object 6D Pose Direct Regression [40.90172673391803]
T6D-Direct is a real-time single-stage direct method with a transformer-based architecture built on DETR to perform 6D multi-object pose direct estimation. Our method achieves the fastest inference time, and the pose estimation accuracy is comparable to state-of-the-art methods.
arXiv Detail & Related papers (2021-09-22T18:13:33Z)
FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism [49.89268018642999]
We propose a fast shape-based network (FS-Net) with efficient category-level feature extraction for 6D pose estimation. The proposed method achieves state-of-the-art performance in both category- and instance-level 6D object pose estimation.
arXiv Detail & Related papers (2021-03-12T03:07:24Z)
GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation [71.83992173720311]
6D pose estimation from a single RGB image is a fundamental task in computer vision. We propose a simple yet effective Geometry-guided Direct Regression Network (GDR-Net) to learn the 6D pose in an end-to-end manner. Our approach remarkably outperforms state-of-the-art methods on LM, LM-O and YCB-V datasets.
arXiv Detail & Related papers (2021-02-24T09:11:31Z)
PrimA6D: Rotational Primitive Reconstruction for Enhanced and Robust 6D Pose Estimation [11.873744190924599]
We introduce a rotational primitive prediction based 6D object pose estimation using a single image as an input. We leverage a Variational AutoEncoder (VAE) to learn this underlying primitive and its associated keypoints. When evaluated over public datasets, our method yields a notable improvement over LINEMOD, Occlusion LINEMOD, and the Y-induced dataset.
arXiv Detail & Related papers (2020-06-14T03:55:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.