KeyMatchNet: Zero-Shot Pose Estimation in 3D Point Clouds by Generalized
Keypoint Matching
- URL: http://arxiv.org/abs/2303.16102v2
- Date: Tue, 26 Sep 2023 11:51:47 GMT
- Title: KeyMatchNet: Zero-Shot Pose Estimation in 3D Point Clouds by Generalized
Keypoint Matching
- Authors: Frederik Hagelskj{\ae}r and Rasmus Laurvig Haugaard
- Abstract summary: KeyMatchNet is a novel network for zero-shot pose estimation in 3D point clouds.
The method generalizes to new objects by using not only the scene point cloud as input but also the object point cloud.
- Score: 5.710971447109951
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we present KeyMatchNet, a novel network for zero-shot pose
estimation in 3D point clouds. The network is trained to match object keypoints
with scene-points, and these matches are then used for pose estimation. The
method generalizes to new objects by using not only the scene point cloud as
input but also the object point cloud. This is in contrast with conventional
methods where object features are stored in network weights. By having a
generalized network we avoid the need for training new models for novel
objects, thus significantly decreasing the computational requirements of the
method.
However, as a result of the complexity, zero-shot pose estimation methods
generally have lower performance than networks trained for a single object. To
address this, we reduce the complexity of the task by including the scenario
information during training. This is generally not feasible as collecting real
data for new tasks increases the cost drastically. But, in the zero-shot pose
estimation task, no retraining is needed for new objects. The expensive data
collection can thus be performed once, and the scenario information is retained
in the network weights.
The network is trained on 1,500 objects and is tested on unseen objects. We
demonstrate that the trained network can accurately estimate poses for novel
objects and demonstrate the ability of the network to perform outside of the
trained class. We believe that the presented method is valuable for many
real-world scenarios. Code, trained network, and dataset will be made available
at publication.
Related papers
- Generalizable Pose Estimation Using Implicit Scene Representations [4.124185654280966]
6-DoF pose estimation is an essential component of robotic manipulation pipelines.
We address the generalization capability of pose estimation using models that contain enough information to render it in different poses.
Our final evaluation shows a significant improvement in inference performance and speed compared to existing approaches.
arXiv Detail & Related papers (2023-05-26T20:42:52Z) - PoseMatcher: One-shot 6D Object Pose Estimation by Deep Feature Matching [51.142988196855484]
We propose PoseMatcher, an accurate model free one-shot object pose estimator.
We create a new training pipeline for object to image matching based on a three-view system.
To enable PoseMatcher to attend to distinct input modalities, an image and a pointcloud, we introduce IO-Layer.
arXiv Detail & Related papers (2023-04-03T21:14:59Z) - NOPE: Novel Object Pose Estimation from a Single Image [67.11073133072527]
We propose an approach that takes a single image of a new object as input and predicts the relative pose of this object in new images without prior knowledge of the object's 3D model.
We achieve this by training a model to directly predict discriminative embeddings for viewpoints surrounding the object.
This prediction is done using a simple U-Net architecture with attention and conditioned on the desired pose, which yields extremely fast inference.
arXiv Detail & Related papers (2023-03-23T18:55:43Z) - ALSO: Automotive Lidar Self-supervision by Occupancy estimation [70.70557577874155]
We propose a new self-supervised method for pre-training the backbone of deep perception models operating on point clouds.
The core idea is to train the model on a pretext task which is the reconstruction of the surface on which the 3D points are sampled.
The intuition is that if the network is able to reconstruct the scene surface, given only sparse input points, then it probably also captures some fragments of semantic information.
arXiv Detail & Related papers (2022-12-12T13:10:19Z) - Unseen Object 6D Pose Estimation: A Benchmark and Baselines [62.8809734237213]
We propose a new task that enables and facilitates algorithms to estimate the 6D pose estimation of novel objects during testing.
We collect a dataset with both real and synthetic images and up to 48 unseen objects in the test set.
By training an end-to-end 3D correspondences network, our method finds corresponding points between an unseen object and a partial view RGBD image accurately and efficiently.
arXiv Detail & Related papers (2022-06-23T16:29:53Z) - Object Pose Estimation using Mid-level Visual Representations [5.220940151628735]
This work proposes a novel pose estimation model for object categories that can be effectively transferred to previously unseen environments.
Deep convolutional network models (CNN) for pose estimation are typically trained and evaluated on datasets curated for object detection, pose estimation, or 3D reconstruction.
We show that the approach is favorable when it comes to generalization and transfer to novel environments.
arXiv Detail & Related papers (2022-03-02T22:49:17Z) - ZePHyR: Zero-shot Pose Hypothesis Rating [36.52070583343388]
We introduce a novel method for zero-shot object pose estimation in clutter.
Our approach uses a hypothesis generation and scoring framework, with a focus on learning a scoring function that generalizes to objects not used for training.
We demonstrate how our system can be used by quickly scanning and building a model of a novel object, which can immediately be used by our method for pose estimation.
arXiv Detail & Related papers (2021-04-28T01:48:39Z) - Few-shot Weakly-Supervised Object Detection via Directional Statistics [55.97230224399744]
We propose a probabilistic multiple instance learning approach for few-shot Common Object Localization (COL) and few-shot Weakly Supervised Object Detection (WSOD)
Our model simultaneously learns the distribution of the novel objects and localizes them via expectation-maximization steps.
Our experiments show that the proposed method, despite being simple, outperforms strong baselines in few-shot COL and WSOD, as well as large-scale WSOD tasks.
arXiv Detail & Related papers (2021-03-25T22:34:16Z) - Self-Supervised Viewpoint Learning From Image Collections [116.56304441362994]
We propose a novel learning framework which incorporates an analysis-by-synthesis paradigm to reconstruct images in a viewpoint aware manner.
We show that our approach performs competitively to fully-supervised approaches for several object categories like human faces, cars, buses, and trains.
arXiv Detail & Related papers (2020-04-03T22:01:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.