Related papers: Towards Self-Supervised Category-Level Object Pose and Size Estimation

Towards Self-Supervised Category-Level Object Pose and Size Estimation

URL: http://arxiv.org/abs/2203.02884v1
Date: Sun, 6 Mar 2022 06:02:30 GMT
Title: Towards Self-Supervised Category-Level Object Pose and Size Estimation
Authors: Yisheng He, Haoqiang Fan, Haibin Huang, Qifeng Chen, Jian Sun
Abstract summary: This work presents a self-supervised framework for category-level object pose and size estimation from a single depth image. We leverage the geometric consistency residing in point clouds of the same shape for self-supervision.
Score: 121.28537953301951
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This work presents a self-supervised framework for category-level object pose and size estimation from a single depth image. Unlike previous works that rely on time-consuming and labor-intensive ground truth pose labels for supervision, we leverage the geometric consistency residing in point clouds of the same shape for self-supervision. Specifically, given a normalized category template mesh in the object-coordinate system and the partially observed object instance in the scene, our key idea is to apply differentiable shape deformation, registration, and rendering to enforce geometric consistency between the predicted and the observed scene object point cloud. We evaluate our approach on real-world datasets and find that our approach outperforms the simple traditional baseline by large margins while being competitive with some fully-supervised approaches.

Related papers

A Multi-Level Similarity Approach for Single-View Object Grasping: Matching, Planning, and Fine-Tuning [17.162675084829242]
We propose a method that robustly achieves unknown-object grasping from a single viewpoint through three key steps.<n>We propose a multi-level similarity matching framework that integrates semantic, geometric, and dimensional features for comprehensive evaluation.<n>In addition, we incorporate the use of large language models, introduce the semi-oriented bounding box, and develop a novel point cloud registration approach based on plane detection to enhance matching accuracy under single-view conditions.
arXiv Detail & Related papers (2025-07-16T06:07:57Z)
Instance-Adaptive Keypoint Learning with Local-to-Global Geometric Aggregation for Category-Level Object Pose Estimation [19.117822086210513]
INKL-Pose is a novel category-level object pose estimation framework. It enables INstance-adaptive Keypoint Learning with local-to-global geometric aggregation. Experiments on CAMERA25, REAL275, and HouseCat6D demonstrate that INKL-Pose achieves state-of-the-art performance.
arXiv Detail & Related papers (2025-04-21T14:37:37Z)
OP-Align: Object-level and Part-level Alignment for Self-supervised Category-level Articulated Object Pose Estimation [7.022004731560844]
Category-level articulated object pose estimation focuses on the pose estimation of unknown articulated objects within known categories. We propose a novel self-supervised approach that leverages a single-frame point cloud to solve this task. Our model consistently generates reconstruction with a canonical pose and joint state for the entire input object.
arXiv Detail & Related papers (2024-08-29T14:10:14Z)
Zero-Shot Object-Centric Representation Learning [72.43369950684057]
We study current object-centric methods through the lens of zero-shot generalization. We introduce a benchmark comprising eight different synthetic and real-world datasets. We find that training on diverse real-world images improves transferability to unseen scenarios.
arXiv Detail & Related papers (2024-08-17T10:37:07Z)
Diffusion-Driven Self-Supervised Learning for Shape Reconstruction and Pose Estimation [26.982199143972835]
We introduce a diffusion-driven self-supervised network for multi-object shape reconstruction and categorical pose estimation. Our method significantly outperforms state-of-the-art self-supervised category-level baselines and even surpasses some fully-supervised instance-level and category-level methods.
arXiv Detail & Related papers (2024-03-19T13:43:27Z)
GenPose: Generative Category-level Object Pose Estimation via Diffusion Models [5.1998359768382905]
We propose a novel solution by reframing categorylevel object pose estimation as conditional generative modeling. Our approach achieves state-of-the-art performance on the REAL275 dataset, surpassing 50% and 60% on strict 5d2cm and 5d5cm metrics.
arXiv Detail & Related papers (2023-06-18T11:45:42Z)
Generative Category-Level Shape and Pose Estimation with Semantic Primitives [27.692997522812615]
We propose a novel framework for category-level object shape and pose estimation from a single RGB-D image. To handle the intra-category variation, we adopt a semantic primitive representation that encodes diverse shapes into a unified latent space. We show that the proposed method achieves SOTA pose estimation performance and better generalization in the real-world dataset.
arXiv Detail & Related papers (2022-10-03T17:51:54Z)
CATRE: Iterative Point Clouds Alignment for Category-level Object Pose Refinement [52.41884119329864]
Category-level object pose and size refiner CATRE is able to iteratively enhance pose estimate from point clouds to produce accurate results. Our approach remarkably outperforms state-of-the-art methods on REAL275, CAMERA25, and LM benchmarks up to a speed of 85.32Hz.
arXiv Detail & Related papers (2022-07-17T05:55:00Z)
3D Object Classification on Partial Point Clouds: A Practical Perspective [91.81377258830703]
A point cloud is a popular shape representation adopted in 3D object classification. This paper introduces a practical setting to classify partial point clouds of object instances under any poses. A novel algorithm in an alignment-classification manner is proposed in this paper.
arXiv Detail & Related papers (2020-12-18T04:00:56Z)
Self-supervised Human Detection and Segmentation via Multi-view Consensus [116.92405645348185]
We propose a multi-camera framework in which geometric constraints are embedded in the form of multi-view consistency during training. We show that our approach outperforms state-of-the-art self-supervised person detection and segmentation techniques on images that visually depart from those of standard benchmarks.
arXiv Detail & Related papers (2020-12-09T15:47:21Z)
Object-Centric Image Generation from Layouts [93.10217725729468]
We develop a layout-to-image-generation method to generate complex scenes with multiple objects. Our method learns representations of the spatial relationships between objects in the scene, which lead to our model's improved layout-fidelity. We introduce SceneFID, an object-centric adaptation of the popular Fr'echet Inception Distance metric, that is better suited for multi-object images.
arXiv Detail & Related papers (2020-03-16T21:40:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.