Distilling 3D distinctive local descriptors for 6D pose estimation
- URL: http://arxiv.org/abs/2503.15106v2
- Date: Thu, 20 Mar 2025 08:27:13 GMT
- Title: Distilling 3D distinctive local descriptors for 6D pose estimation
- Authors: Amir Hamza, Andrea Caraffa, Davide Boscaini, Fabio Poiesi,
- Abstract summary: We introduce a knowledge distillation framework that trains an efficient student model to regress local descriptors from a GeDi teacher.<n>We validate our approach on five BOP Benchmark datasets and demonstrate a significant reduction in inference time.
- Score: 5.754251195342313
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Three-dimensional local descriptors are crucial for encoding geometric surface properties, making them essential for various point cloud understanding tasks. Among these descriptors, GeDi has demonstrated strong zero-shot 6D pose estimation capabilities but remains computationally impractical for real-world applications due to its expensive inference process. Can we retain GeDi's effectiveness while significantly improving its efficiency? In this paper, we explore this question by introducing a knowledge distillation framework that trains an efficient student model to regress local descriptors from a GeDi teacher. Our key contributions include: an efficient large-scale training procedure that ensures robustness to occlusions and partial observations while operating under compute and storage constraints, and a novel loss formulation that handles weak supervision from non-distinctive teacher descriptors. We validate our approach on five BOP Benchmark datasets and demonstrate a significant reduction in inference time while maintaining competitive performance with existing methods, bringing zero-shot 6D pose estimation closer to real-time feasibility. Project Website: https://tev-fbk.github.io/dGeDi/
Related papers
- Uncertainty-Aware Knowledge Distillation for Compact and Efficient 6DoF Pose Estimation [9.742944501209656]
This paper introduces a novel uncertainty-aware end-to-end Knowledge Distillation (KD) framework focused on keypoint-based 6DoF pose estimation.<n>We propose a distillation strategy that aligns the student and teacher predictions by adjusting the knowledge transfer based on the uncertainty associated with each teacher keypoint prediction.<n> Experiments on the widely-used LINEMOD benchmark demonstrate the effectiveness of our method, achieving superior 6DoF object pose estimation with lightweight models.
arXiv Detail & Related papers (2025-03-17T10:56:30Z) - Bayesian Self-Training for Semi-Supervised 3D Segmentation [59.544558398992386]
3D segmentation is a core problem in computer vision.
densely labeling 3D point clouds to employ fully-supervised training remains too labor intensive and expensive.
Semi-supervised training provides a more practical alternative, where only a small set of labeled data is given, accompanied by a larger unlabeled set.
arXiv Detail & Related papers (2024-09-12T14:54:31Z) - IPoD: Implicit Field Learning with Point Diffusion for Generalizable 3D Object Reconstruction from Single RGB-D Images [50.4538089115248]
Generalizable 3D object reconstruction from single-view RGB-D images remains a challenging task.
We propose a novel approach, IPoD, which harmonizes implicit field learning with point diffusion.
Experiments conducted on the CO3D-v2 dataset affirm the superiority of IPoD, achieving 7.8% improvement in F-score and 28.6% in Chamfer distance over existing methods.
arXiv Detail & Related papers (2024-03-30T07:17:37Z) - FILP-3D: Enhancing 3D Few-shot Class-incremental Learning with Pre-trained Vision-Language Models [59.13757801286343]
Few-shot class-incremental learning aims to mitigate the catastrophic forgetting issue when a model is incrementally trained on limited data.<n>We introduce the FILP-3D framework with two novel components: the Redundant Feature Eliminator (RFE) for feature space misalignment and the Spatial Noise Compensator (SNC) for significant noise.
arXiv Detail & Related papers (2023-12-28T14:52:07Z) - FSD: Fast Self-Supervised Single RGB-D to Categorical 3D Objects [37.175069234979645]
This work addresses the challenging task of 3D object recognition without the reliance on real-world 3D labeled data.
Our goal is to predict the 3D shape, size, and 6D pose of objects within a single RGB-D image, operating at the category level and eliminating the need for CAD models during inference.
arXiv Detail & Related papers (2023-10-19T17:59:09Z) - Representation Disparity-aware Distillation for 3D Object Detection [44.17712259352281]
This paper presents a novel representation disparity-aware distillation (RDD) method to address the representation disparity issue.
Our RDD increases mAP of CP-Voxel-S to 57.1% on nuScenes dataset, which even surpasses teacher performance while taking up only 42% FLOPs.
arXiv Detail & Related papers (2023-08-20T16:06:42Z) - Manifold-Aware Self-Training for Unsupervised Domain Adaptation on
Regressing 6D Object Pose [69.14556386954325]
Domain gap between synthetic and real data in visual regression is bridged in this paper via global feature alignment and local refinement.
Our method incorporates an explicit self-supervised manifold regularization, revealing consistent cumulative target dependency across domains.
Learning unified implicit neural functions to estimate relative direction and distance of targets to their nearest class bins aims to refine target classification predictions.
arXiv Detail & Related papers (2023-05-18T08:42:41Z) - Dynamic Iterative Refinement for Efficient 3D Hand Pose Estimation [87.54604263202941]
We propose a tiny deep neural network of which partial layers are iteratively exploited for refining its previous estimations.
We employ learned gating criteria to decide whether to exit from the weight-sharing loop, allowing per-sample adaptation in our model.
Our method consistently outperforms state-of-the-art 2D/3D hand pose estimation approaches in terms of both accuracy and efficiency for widely used benchmarks.
arXiv Detail & Related papers (2021-11-11T23:31:34Z) - FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose
Estimation with Decoupled Rotation Mechanism [49.89268018642999]
We propose a fast shape-based network (FS-Net) with efficient category-level feature extraction for 6D pose estimation.
The proposed method achieves state-of-the-art performance in both category- and instance-level 6D object pose estimation.
arXiv Detail & Related papers (2021-03-12T03:07:24Z) - 3D Point-to-Keypoint Voting Network for 6D Pose Estimation [8.801404171357916]
We propose a framework for 6D pose estimation from RGB-D data based on spatial structure characteristics of 3D keypoints.
The proposed method is verified on two benchmark datasets, LINEMOD and OCCLUSION LINEMOD.
arXiv Detail & Related papers (2020-12-22T11:43:15Z) - 3D Registration for Self-Occluded Objects in Context [66.41922513553367]
We introduce the first deep learning framework capable of effectively handling this scenario.
Our method consists of an instance segmentation module followed by a pose estimation one.
It allows us to perform 3D registration in a one-shot manner, without requiring an expensive iterative procedure.
arXiv Detail & Related papers (2020-11-23T08:05:28Z) - PAM:Point-wise Attention Module for 6D Object Pose Estimation [2.4815579733050153]
6D pose estimation refers to object recognition and estimation of 3D rotation and 3D translation.
Previous methods utilized depth information in the refinement process or were designed as a heterogeneous architecture for each data space to extract feature.
This paper proposes a Point Attention Module that can efficiently extract powerful feature from RGB-D.
arXiv Detail & Related papers (2020-08-12T11:29:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.