PACE: A Large-Scale Dataset with Pose Annotations in Cluttered Environments
- URL: http://arxiv.org/abs/2312.15130v3
- Date: Fri, 19 Jul 2024 16:28:09 GMT
- Title: PACE: A Large-Scale Dataset with Pose Annotations in Cluttered Environments
- Authors: Yang You, Kai Xiong, Zhening Yang, Zhengxiang Huang, Junwei Zhou, Ruoxi Shi, Zhou Fang, Adam W. Harley, Leonidas Guibas, Cewu Lu,
- Abstract summary: PACE (Pose s in Cluttered Environments) is a large-scale benchmark for pose estimation methods in cluttered scenarios.
The benchmark consists of 55K frames with 258K annotations across 300 videos, covering 238 objects from 43 categories.
PACE-Sim contains 100K photo-realistic simulated frames with 2.4M annotations across 931 objects.
- Score: 50.79058028754952
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce PACE (Pose Annotations in Cluttered Environments), a large-scale benchmark designed to advance the development and evaluation of pose estimation methods in cluttered scenarios. PACE provides a large-scale real-world benchmark for both instance-level and category-level settings. The benchmark consists of 55K frames with 258K annotations across 300 videos, covering 238 objects from 43 categories and featuring a mix of rigid and articulated items in cluttered scenes. To annotate the real-world data efficiently, we develop an innovative annotation system with a calibrated 3-camera setup. Additionally, we offer PACE-Sim, which contains 100K photo-realistic simulated frames with 2.4M annotations across 931 objects. We test state-of-the-art algorithms in PACE along two tracks: pose estimation, and object pose tracking, revealing the benchmark's challenges and research opportunities. Our benchmark code and data is available on https://github.com/qq456cvb/PACE.
Related papers
- TAPVid-3D: A Benchmark for Tracking Any Point in 3D [63.060421798990845]
We introduce a new benchmark, TAPVid-3D, for evaluating the task of Tracking Any Point in 3D.
This benchmark will serve as a guidepost to improve our ability to understand precise 3D motion and surface deformation from monocular video.
arXiv Detail & Related papers (2024-07-08T13:28:47Z) - Omni6DPose: A Benchmark and Model for Universal 6D Object Pose Estimation and Tracking [9.365544189576363]
6D Object Pose Estimation is a crucial yet challenging task in computer vision, suffering from a significant lack of large-scale datasets.
This paper introduces Omni6DPose, a dataset characterized by its diversity in object categories, large scale, and variety in object materials.
We introduce GenPose++, an enhanced version of the SOTA category-level pose estimation framework, incorporating two pivotal improvements.
arXiv Detail & Related papers (2024-06-06T17:57:20Z) - PointOdyssey: A Large-Scale Synthetic Dataset for Long-Term Point
Tracking [90.29143475328506]
We introduce PointOdyssey, a large-scale synthetic dataset, and data generation framework.
Our goal is to advance the state-of-the-art by placing emphasis on long videos with naturalistic motion.
We animate deformable characters using real-world motion capture data, we build 3D scenes to match the motion capture environments, and we render camera viewpoints using trajectories mined via structure-from-motion on real videos.
arXiv Detail & Related papers (2023-07-27T17:58:11Z) - Contrastive Lift: 3D Object Instance Segmentation by Slow-Fast
Contrastive Fusion [110.84357383258818]
We propose a novel approach to lift 2D segments to 3D and fuse them by means of a neural field representation.
The core of our approach is a slow-fast clustering objective function, which is scalable and well-suited for scenes with a large number of objects.
Our approach outperforms the state-of-the-art on challenging scenes from the ScanNet, Hypersim, and Replica datasets.
arXiv Detail & Related papers (2023-06-07T17:57:45Z) - SLOPER4D: A Scene-Aware Dataset for Global 4D Human Pose Estimation in
Urban Environments [0.0]
We present SLOPER4D, a novel scene-aware dataset collected in large urban environments.
We record 12 human subjects' activities over 10 diverse urban scenes from an egocentric view.
SLOPER4D consists of 15 sequences of human motions, each of which has a trajectory length of more than 200 meters.
arXiv Detail & Related papers (2023-03-16T05:54:15Z) - A Neuromorphic Dataset for Object Segmentation in Indoor Cluttered
Environment [3.6047642906482142]
This paper proposes a new Event-based ESD dataset for object segmentation in an indoor environment.
Our proposed dataset comprises 145 sequences with 14,166 RGB frames that are manually annotated with instance masks.
Overall 21.88 million and 20.80 million events from two event-based cameras in a stereo-graphic configuration are collected.
arXiv Detail & Related papers (2023-02-13T12:02:51Z) - Pose for Everything: Towards Category-Agnostic Pose Estimation [93.07415325374761]
Category-Agnostic Pose Estimation (CAPE) aims to create a pose estimation model capable of detecting the pose of any class of object given only a few samples with keypoint definition.
A transformer-based Keypoint Interaction Module (KIM) is proposed to capture both the interactions among different keypoints and the relationship between the support and query images.
We also introduce Multi-category Pose (MP-100) dataset, which is a 2D pose dataset of 100 object categories containing over 20K instances and is well-designed for developing CAPE algorithms.
arXiv Detail & Related papers (2022-07-21T09:40:54Z) - BCOT: A Markerless High-Precision 3D Object Tracking Benchmark [15.8625561193144]
We present a multi-view approach to estimate the accurate 3D poses of real moving objects, and then use binocular data to construct a new benchmark for monocular textureless 3D object tracking.
Based on our object-centered model, we jointly optimize the object pose by minimizing shape re-projection constraints in all views.
Our new benchmark dataset contains 20 textureless objects, 22 scenes, 404 video sequences and 126K images captured in real scenes.
arXiv Detail & Related papers (2022-03-25T03:55:03Z) - Benchmarking Unsupervised Object Representations for Video Sequences [111.81492107649889]
We compare the perceptual abilities of four object-centric approaches: ViMON, OP3, TBA and SCALOR.
Our results suggest that the architectures with unconstrained latent representations learn more powerful representations in terms of object detection, segmentation and tracking.
Our benchmark may provide fruitful guidance towards learning more robust object-centric video representations.
arXiv Detail & Related papers (2020-06-12T09:37:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.