Omni6DPose: A Benchmark and Model for Universal 6D Object Pose Estimation and Tracking
- URL: http://arxiv.org/abs/2406.04316v1
- Date: Thu, 6 Jun 2024 17:57:20 GMT
- Title: Omni6DPose: A Benchmark and Model for Universal 6D Object Pose Estimation and Tracking
- Authors: Jiyao Zhang, Weiyao Huang, Bo Peng, Mingdong Wu, Fei Hu, Zijian Chen, Bo Zhao, Hao Dong,
- Abstract summary: 6D Object Pose Estimation is a crucial yet challenging task in computer vision, suffering from a significant lack of large-scale datasets.
This paper introduces Omni6DPose, a dataset characterized by its diversity in object categories, large scale, and variety in object materials.
We introduce GenPose++, an enhanced version of the SOTA category-level pose estimation framework, incorporating two pivotal improvements.
- Score: 9.365544189576363
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 6D Object Pose Estimation is a crucial yet challenging task in computer vision, suffering from a significant lack of large-scale datasets. This scarcity impedes comprehensive evaluation of model performance, limiting research advancements. Furthermore, the restricted number of available instances or categories curtails its applications. To address these issues, this paper introduces Omni6DPose, a substantial dataset characterized by its diversity in object categories, large scale, and variety in object materials. Omni6DPose is divided into three main components: ROPE (Real 6D Object Pose Estimation Dataset), which includes 332K images annotated with over 1.5M annotations across 581 instances in 149 categories; SOPE(Simulated 6D Object Pose Estimation Dataset), consisting of 475K images created in a mixed reality setting with depth simulation, annotated with over 5M annotations across 4162 instances in the same 149 categories; and the manually aligned real scanned objects used in both ROPE and SOPE. Omni6DPose is inherently challenging due to the substantial variations and ambiguities. To address this challenge, we introduce GenPose++, an enhanced version of the SOTA category-level pose estimation framework, incorporating two pivotal improvements: Semantic-aware feature extraction and Clustering-based aggregation. Moreover, we provide a comprehensive benchmarking analysis to evaluate the performance of previous methods on this large-scale dataset in the realms of 6D object pose estimation and pose tracking.
Related papers
- Omni6D: Large-Vocabulary 3D Object Dataset for Category-Level 6D Object Pose Estimation [74.44739529186798]
We introduce Omni6D, a comprehensive RGBD dataset featuring a wide range of categories and varied backgrounds.
The dataset comprises an extensive spectrum of 166 categories, 4688 instances adjusted to the canonical pose, and over 0.8 million captures.
We believe this initiative will pave the way for new insights and substantial progress in both the industrial and academic fields.
arXiv Detail & Related papers (2024-09-26T20:13:33Z) - SA6D: Self-Adaptive Few-Shot 6D Pose Estimator for Novel and Occluded
Objects [24.360831082478313]
We propose a few-shot pose estimation (FSPE) approach called SA6D.
It uses a self-adaptive segmentation module to identify the novel target object and construct a point cloud model of the target object.
We evaluate SA6D on real-world tabletop object datasets and demonstrate that SA6D outperforms existing FSPE methods.
arXiv Detail & Related papers (2023-08-31T08:19:26Z) - POPE: 6-DoF Promptable Pose Estimation of Any Object, in Any Scene, with
One Reference [72.32413378065053]
We propose a general paradigm for object pose estimation, called Promptable Object Pose Estimation (POPE)
POPE enables zero-shot 6DoF object pose estimation for any target object in any scene, while only a single reference is adopted as the support view.
Comprehensive experimental results demonstrate that POPE exhibits unrivaled robust performance in zero-shot settings.
arXiv Detail & Related papers (2023-05-25T05:19:17Z) - HouseCat6D -- A Large-Scale Multi-Modal Category Level 6D Object
Perception Dataset with Household Objects in Realistic Scenarios [41.54851386729952]
We introduce HouseCat6D, a new category-level 6D pose dataset.
It features 1) multi-modality with Polarimetric RGB and Depth (RGBD+P), 2) encompasses 194 diverse objects across 10 household categories, including two photometrically challenging ones, and 3) provides high-quality pose annotations with an error range of only 1.35 mm to 1.74 mm.
arXiv Detail & Related papers (2022-12-20T17:06:32Z) - MegaPose: 6D Pose Estimation of Novel Objects via Render & Compare [84.80956484848505]
MegaPose is a method to estimate the 6D pose of novel objects, that is, objects unseen during training.
We present a 6D pose refiner based on a render&compare strategy which can be applied to novel objects.
Second, we introduce a novel approach for coarse pose estimation which leverages a network trained to classify whether the pose error between a synthetic rendering and an observed image of the same object can be corrected by the refiner.
arXiv Detail & Related papers (2022-12-13T19:30:03Z) - A mixed-reality dataset for category-level 6D pose and size estimation
of hand-occluded containers [36.189924244458595]
We present a mixed-reality dataset of hand-occluded containers for category-level 6D object pose and size estimation.
The dataset consists of 138,240 images of rendered hands and forearms holding 48 synthetic objects, split into 3 grasp categories over 30 real backgrounds.
arXiv Detail & Related papers (2022-11-18T19:14:52Z) - Unseen Object 6D Pose Estimation: A Benchmark and Baselines [62.8809734237213]
We propose a new task that enables and facilitates algorithms to estimate the 6D pose estimation of novel objects during testing.
We collect a dataset with both real and synthetic images and up to 48 unseen objects in the test set.
By training an end-to-end 3D correspondences network, our method finds corresponding points between an unseen object and a partial view RGBD image accurately and efficiently.
arXiv Detail & Related papers (2022-06-23T16:29:53Z) - PhoCaL: A Multi-Modal Dataset for Category-Level Object Pose Estimation
with Photometrically Challenging Objects [45.31344700263873]
We introduce a multimodal dataset for category-level object pose estimation with photometrically challenging objects termed PhoCaL.
PhoCaL comprises 60 high quality 3D models of household objects over 8 categories including highly reflective, transparent and symmetric objects.
It ensures sub-millimeter accuracy of the pose for opaque textured, shiny and transparent objects, no motion blur and perfect camera synchronisation.
arXiv Detail & Related papers (2022-05-18T09:21:09Z) - DONet: Learning Category-Level 6D Object Pose and Size Estimation from
Depth Observation [53.55300278592281]
We propose a method of Category-level 6D Object Pose and Size Estimation (COPSE) from a single depth image.
Our framework makes inferences based on the rich geometric information of the object in the depth channel alone.
Our framework competes with state-of-the-art approaches that require labeled real-world images.
arXiv Detail & Related papers (2021-06-27T10:41:50Z) - Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets.
This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets.
We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.