ZS6D: Zero-shot 6D Object Pose Estimation using Vision Transformers
- URL: http://arxiv.org/abs/2309.11986v1
- Date: Thu, 21 Sep 2023 11:53:01 GMT
- Title: ZS6D: Zero-shot 6D Object Pose Estimation using Vision Transformers
- Authors: Philipp Ausserlechner, David Haberger, Stefan Thalhammer,
Jean-Baptiste Weibel and Markus Vincze
- Abstract summary: We introduce ZS6D, for zero-shot novel object 6D pose estimation.
Visual descriptors, extracted using pre-trained Vision Transformers (ViT), are used for matching rendered templates.
Experiments are performed on LMO, YCBV, and TLESS datasets.
- Score: 9.899633398596672
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As robotic systems increasingly encounter complex and unconstrained
real-world scenarios, there is a demand to recognize diverse objects. The
state-of-the-art 6D object pose estimation methods rely on object-specific
training and therefore do not generalize to unseen objects. Recent novel object
pose estimation methods are solving this issue using task-specific fine-tuned
CNNs for deep template matching. This adaptation for pose estimation still
requires expensive data rendering and training procedures. MegaPose for example
is trained on a dataset consisting of two million images showing 20,000
different objects to reach such generalization capabilities. To overcome this
shortcoming we introduce ZS6D, for zero-shot novel object 6D pose estimation.
Visual descriptors, extracted using pre-trained Vision Transformers (ViT), are
used for matching rendered templates against query images of objects and for
establishing local correspondences. These local correspondences enable deriving
geometric correspondences and are used for estimating the object's 6D pose with
RANSAC-based PnP. This approach showcases that the image descriptors extracted
by pre-trained ViTs are well-suited to achieve a notable improvement over two
state-of-the-art novel object 6D pose estimation methods, without the need for
task-specific fine-tuning. Experiments are performed on LMO, YCBV, and TLESS.
In comparison to one of the two methods we improve the Average Recall on all
three datasets and compared to the second method we improve on two datasets.
Related papers
- MegaPose: 6D Pose Estimation of Novel Objects via Render & Compare [84.80956484848505]
MegaPose is a method to estimate the 6D pose of novel objects, that is, objects unseen during training.
We present a 6D pose refiner based on a render&compare strategy which can be applied to novel objects.
Second, we introduce a novel approach for coarse pose estimation which leverages a network trained to classify whether the pose error between a synthetic rendering and an observed image of the same object can be corrected by the refiner.
arXiv Detail & Related papers (2022-12-13T19:30:03Z) - Imitrob: Imitation Learning Dataset for Training and Evaluating 6D
Object Pose Estimators [20.611000416051546]
This paper introduces a dataset for training and evaluating methods for 6D pose estimation of hand-held tools in task demonstrations captured by a standard RGB camera.
The dataset contains image sequences of nine different tools and twelve manipulation tasks with two camera viewpoints, four human subjects, and left/right hand.
arXiv Detail & Related papers (2022-09-16T14:43:46Z) - Unseen Object 6D Pose Estimation: A Benchmark and Baselines [62.8809734237213]
We propose a new task that enables and facilitates algorithms to estimate the 6D pose estimation of novel objects during testing.
We collect a dataset with both real and synthetic images and up to 48 unseen objects in the test set.
By training an end-to-end 3D correspondences network, our method finds corresponding points between an unseen object and a partial view RGBD image accurately and efficiently.
arXiv Detail & Related papers (2022-06-23T16:29:53Z) - Coupled Iterative Refinement for 6D Multi-Object Pose Estimation [64.7198752089041]
Given a set of known 3D objects and an RGB or RGB-D input image, we detect and estimate the 6D pose of each object.
Our approach iteratively refines both pose and correspondence in a tightly coupled manner, allowing us to dynamically remove outliers to improve accuracy.
arXiv Detail & Related papers (2022-04-26T18:00:08Z) - Weakly Supervised Learning of Keypoints for 6D Object Pose Estimation [73.40404343241782]
We propose a weakly supervised 6D object pose estimation approach based on 2D keypoint detection.
Our approach achieves comparable performance with state-of-the-art fully supervised approaches.
arXiv Detail & Related papers (2022-03-07T16:23:47Z) - Spatial Attention Improves Iterative 6D Object Pose Estimation [52.365075652976735]
We propose a new method for 6D pose estimation refinement from RGB images.
Our main insight is that after the initial pose estimate, it is important to pay attention to distinct spatial features of the object.
We experimentally show that this approach learns to attend to salient spatial features and learns to ignore occluded parts of the object, leading to better pose estimation across datasets.
arXiv Detail & Related papers (2021-01-05T17:18:52Z) - CosyPose: Consistent multi-view multi-object 6D pose estimation [48.097599674329004]
We present a single-view single-object 6D pose estimation method, which we use to generate 6D object pose hypotheses.
Second, we develop a robust method for matching individual 6D object pose hypotheses across different input images.
Third, we develop a method for global scene refinement given multiple object hypotheses and their correspondences across views.
arXiv Detail & Related papers (2020-08-19T14:11:56Z) - Single Shot 6D Object Pose Estimation [11.37625512264302]
We introduce a novel single shot approach for 6D object pose estimation of rigid objects based on depth images.
A fully convolutional neural network is employed, where the 3D input data is spatially discretized and pose estimation is considered as a regression task.
With 65 fps on a GPU, our Object Pose Network (OP-Net) is extremely fast, is optimized end-to-end, and estimates the 6D pose of multiple objects in the image simultaneously.
arXiv Detail & Related papers (2020-04-27T11:59:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.