PViT-6D: Overclocking Vision Transformers for 6D Pose Estimation with
Confidence-Level Prediction and Pose Tokens
- URL: http://arxiv.org/abs/2311.17504v1
- Date: Wed, 29 Nov 2023 10:27:56 GMT
- Title: PViT-6D: Overclocking Vision Transformers for 6D Pose Estimation with
Confidence-Level Prediction and Pose Tokens
- Authors: Sebastian Stapf, Tobias Bauernfeind, Marco Riboldi
- Abstract summary: We explore the capabilities of Vision Transformers for direct 6D pose estimation through a tailored use of classification tokens.
We also introduce a simple method for determining pose confidence, which can be readily integrated into most 6D pose estimation frameworks.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the current state of 6D pose estimation, top-performing techniques depend
on complex intermediate correspondences, specialized architectures, and
non-end-to-end algorithms. In contrast, our research reframes the problem as a
straightforward regression task by exploring the capabilities of Vision
Transformers for direct 6D pose estimation through a tailored use of
classification tokens. We also introduce a simple method for determining pose
confidence, which can be readily integrated into most 6D pose estimation
frameworks. This involves modifying the transformer architecture by decreasing
the number of query elements based on the network's assessment of the scene
complexity. Our method that we call Pose Vision Transformer or PViT-6D provides
the benefits of simple implementation and being end-to-end learnable while
outperforming current state-of-the-art methods by +0.3% ADD(-S) on
Linemod-Occlusion and +2.7% ADD(-S) on the YCB-V dataset. Moreover, our method
enhances both the model's interpretability and the reliability of its
performance during inference.
Related papers
- 6DOPE-GS: Online 6D Object Pose Estimation using Gaussian Splatting [7.7145084897748974]
We present 6DOPE-GS, a novel method for online 6D object pose estimation & tracking with a single RGB-D camera.
We show that 6DOPE-GS matches the performance of state-of-the-art baselines for model-free simultaneous 6D pose tracking and reconstruction.
We also demonstrate the method's suitability for live, dynamic object tracking and reconstruction in a real-world setting.
arXiv Detail & Related papers (2024-12-02T14:32:19Z) - UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues [55.69339788566899]
UPose3D is a novel approach for multi-view 3D human pose estimation.
It improves robustness and flexibility without requiring direct 3D annotations.
arXiv Detail & Related papers (2024-04-23T00:18:00Z) - Advancing 6D Pose Estimation in Augmented Reality -- Overcoming Projection Ambiguity with Uncontrolled Imagery [0.0]
This study addresses the challenge of accurate 6D pose estimation in Augmented Reality (AR)
We propose a novel approach that strategically decomposes the estimation of z-axis translation and focal length.
This methodology not only streamlines the 6D pose estimation process but also significantly enhances the accuracy of 3D object overlaying in AR settings.
arXiv Detail & Related papers (2024-03-20T09:22:22Z) - YOLOPose V2: Understanding and Improving Transformer-based 6D Pose
Estimation [36.067414358144816]
YOLOPose is a Transformer-based multi-object 6D pose estimation method.
We employ a learnable orientation estimation module to predict the orientation from the keypoints.
Our method is suitable for real-time applications and achieves results comparable to state-of-the-art methods.
arXiv Detail & Related papers (2023-07-21T12:53:54Z) - TransPose: A Transformer-based 6D Object Pose Estimation Network with
Depth Refinement [5.482532589225552]
We propose TransPose, an improved Transformer-based 6D pose estimation with a depth refinement module.
The architecture takes in only an RGB image as input with no additional supplementing modalities such as depth or thermal images.
A novel depth refinement module is then used alongside the predicted centers, 6D poses and depth patches to refine the accuracy of the estimated 6D pose.
arXiv Detail & Related papers (2023-07-09T17:33:13Z) - Coupled Iterative Refinement for 6D Multi-Object Pose Estimation [64.7198752089041]
Given a set of known 3D objects and an RGB or RGB-D input image, we detect and estimate the 6D pose of each object.
Our approach iteratively refines both pose and correspondence in a tightly coupled manner, allowing us to dynamically remove outliers to improve accuracy.
arXiv Detail & Related papers (2022-04-26T18:00:08Z) - T6D-Direct: Transformers for Multi-Object 6D Pose Direct Regression [40.90172673391803]
T6D-Direct is a real-time single-stage direct method with a transformer-based architecture built on DETR to perform 6D multi-object pose direct estimation.
Our method achieves the fastest inference time, and the pose estimation accuracy is comparable to state-of-the-art methods.
arXiv Detail & Related papers (2021-09-22T18:13:33Z) - SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation [98.83762558394345]
SO-Pose is a framework for regressing all 6 degrees-of-freedom (6DoF) for the object pose in a cluttered environment from a single RGB image.
We introduce a novel reasoning about self-occlusion, in order to establish a two-layer representation for 3D objects.
Cross-layer consistencies that align correspondences, self-occlusion and 6D pose, we can further improve accuracy and robustness.
arXiv Detail & Related papers (2021-08-18T19:49:29Z) - FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose
Estimation with Decoupled Rotation Mechanism [49.89268018642999]
We propose a fast shape-based network (FS-Net) with efficient category-level feature extraction for 6D pose estimation.
The proposed method achieves state-of-the-art performance in both category- and instance-level 6D object pose estimation.
arXiv Detail & Related papers (2021-03-12T03:07:24Z) - Spatial Attention Improves Iterative 6D Object Pose Estimation [52.365075652976735]
We propose a new method for 6D pose estimation refinement from RGB images.
Our main insight is that after the initial pose estimate, it is important to pay attention to distinct spatial features of the object.
We experimentally show that this approach learns to attend to salient spatial features and learns to ignore occluded parts of the object, leading to better pose estimation across datasets.
arXiv Detail & Related papers (2021-01-05T17:18:52Z) - Self6D: Self-Supervised Monocular 6D Object Pose Estimation [114.18496727590481]
We propose the idea of monocular 6D pose estimation by means of self-supervised learning.
We leverage recent advances in neural rendering to further self-supervise the model on unannotated real RGB-D data.
arXiv Detail & Related papers (2020-04-14T13:16:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.