Omni6D: Large-Vocabulary 3D Object Dataset for Category-Level 6D Object Pose Estimation
- URL: http://arxiv.org/abs/2409.18261v2
- Date: Mon, 30 Sep 2024 02:06:02 GMT
- Title: Omni6D: Large-Vocabulary 3D Object Dataset for Category-Level 6D Object Pose Estimation
- Authors: Mengchen Zhang, Tong Wu, Tai Wang, Tengfei Wang, Ziwei Liu, Dahua Lin,
- Abstract summary: We introduce Omni6D, a comprehensive RGBD dataset featuring a wide range of categories and varied backgrounds.
The dataset comprises an extensive spectrum of 166 categories, 4688 instances adjusted to the canonical pose, and over 0.8 million captures.
We believe this initiative will pave the way for new insights and substantial progress in both the industrial and academic fields.
- Score: 74.44739529186798
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: 6D object pose estimation aims at determining an object's translation, rotation, and scale, typically from a single RGBD image. Recent advancements have expanded this estimation from instance-level to category-level, allowing models to generalize across unseen instances within the same category. However, this generalization is limited by the narrow range of categories covered by existing datasets, such as NOCS, which also tend to overlook common real-world challenges like occlusion. To tackle these challenges, we introduce Omni6D, a comprehensive RGBD dataset featuring a wide range of categories and varied backgrounds, elevating the task to a more realistic context. 1) The dataset comprises an extensive spectrum of 166 categories, 4688 instances adjusted to the canonical pose, and over 0.8 million captures, significantly broadening the scope for evaluation. 2) We introduce a symmetry-aware metric and conduct systematic benchmarks of existing algorithms on Omni6D, offering a thorough exploration of new challenges and insights. 3) Additionally, we propose an effective fine-tuning approach that adapts models from previous datasets to our extensive vocabulary setting. We believe this initiative will pave the way for new insights and substantial progress in both the industrial and academic fields, pushing forward the boundaries of general 6D pose estimation.
Related papers
- Omni6DPose: A Benchmark and Model for Universal 6D Object Pose Estimation and Tracking [9.365544189576363]
6D Object Pose Estimation is a crucial yet challenging task in computer vision, suffering from a significant lack of large-scale datasets.
This paper introduces Omni6DPose, a dataset characterized by its diversity in object categories, large scale, and variety in object materials.
We introduce GenPose++, an enhanced version of the SOTA category-level pose estimation framework, incorporating two pivotal improvements.
arXiv Detail & Related papers (2024-06-06T17:57:20Z) - Zero123-6D: Zero-shot Novel View Synthesis for RGB Category-level 6D Pose Estimation [66.3814684757376]
This work presents Zero123-6D, the first work to demonstrate the utility of Diffusion Model-based novel-view-synthesizers in enhancing RGB 6D pose estimation at category-level.
The outlined method shows reduction in data requirements, removal of the necessity of depth information in zero-shot category-level 6D pose estimation task, and increased performance, quantitatively demonstrated through experiments on the CO3D dataset.
arXiv Detail & Related papers (2024-03-21T10:38:18Z) - OV9D: Open-Vocabulary Category-Level 9D Object Pose and Size Estimation [56.028185293563325]
This paper studies a new open-set problem, the open-vocabulary category-level object pose and size estimation.
We first introduce OO3D-9D, a large-scale photorealistic dataset for this task.
We then propose a framework built on pre-trained DinoV2 and text-to-image stable diffusion models.
arXiv Detail & Related papers (2024-03-19T03:09:24Z) - SA6D: Self-Adaptive Few-Shot 6D Pose Estimator for Novel and Occluded
Objects [24.360831082478313]
We propose a few-shot pose estimation (FSPE) approach called SA6D.
It uses a self-adaptive segmentation module to identify the novel target object and construct a point cloud model of the target object.
We evaluate SA6D on real-world tabletop object datasets and demonstrate that SA6D outperforms existing FSPE methods.
arXiv Detail & Related papers (2023-08-31T08:19:26Z) - HouseCat6D -- A Large-Scale Multi-Modal Category Level 6D Object
Perception Dataset with Household Objects in Realistic Scenarios [41.54851386729952]
We introduce HouseCat6D, a new category-level 6D pose dataset.
It features 1) multi-modality with Polarimetric RGB and Depth (RGBD+P), 2) encompasses 194 diverse objects across 10 household categories, including two photometrically challenging ones, and 3) provides high-quality pose annotations with an error range of only 1.35 mm to 1.74 mm.
arXiv Detail & Related papers (2022-12-20T17:06:32Z) - PhoCaL: A Multi-Modal Dataset for Category-Level Object Pose Estimation
with Photometrically Challenging Objects [45.31344700263873]
We introduce a multimodal dataset for category-level object pose estimation with photometrically challenging objects termed PhoCaL.
PhoCaL comprises 60 high quality 3D models of household objects over 8 categories including highly reflective, transparent and symmetric objects.
It ensures sub-millimeter accuracy of the pose for opaque textured, shiny and transparent objects, no motion blur and perfect camera synchronisation.
arXiv Detail & Related papers (2022-05-18T09:21:09Z) - FS6D: Few-Shot 6D Pose Estimation of Novel Objects [116.34922994123973]
6D object pose estimation networks are limited in their capability to scale to large numbers of object instances.
In this work, we study a new open set problem; the few-shot 6D object poses estimation: estimating the 6D pose of an unknown object by a few support views without extra training.
arXiv Detail & Related papers (2022-03-28T10:31:29Z) - GPV-Pose: Category-level Object Pose Estimation via Geometry-guided
Point-wise Voting [103.74918834553249]
GPV-Pose is a novel framework for robust category-level pose estimation.
It harnesses geometric insights to enhance the learning of category-level pose-sensitive features.
It produces superior results to state-of-the-art competitors on common public benchmarks.
arXiv Detail & Related papers (2022-03-15T13:58:50Z) - Single-stage Keypoint-based Category-level Object Pose Estimation from
an RGB Image [27.234658117816103]
We propose a single-stage, keypoint-based approach for category-level object pose estimation.
The proposed network performs 2D object detection, detects 2D keypoints, estimates 6-DoF pose, and regresses relative bounding cuboid dimensions.
We conduct extensive experiments on the challenging Objectron benchmark, outperforming state-of-the-art methods on the 3D IoU metric.
arXiv Detail & Related papers (2021-09-13T17:55:00Z) - Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets.
This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets.
We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.