Related papers: PACE: A Large-Scale Dataset with Pose Annotations in Cluttered Environments

PACE: A Large-Scale Dataset with Pose Annotations in Cluttered Environments

URL: http://arxiv.org/abs/2312.15130v3
Date: Fri, 19 Jul 2024 16:28:09 GMT
Title: PACE: A Large-Scale Dataset with Pose Annotations in Cluttered Environments
Authors: Yang You, Kai Xiong, Zhening Yang, Zhengxiang Huang, Junwei Zhou, Ruoxi Shi, Zhou Fang, Adam W. Harley, Leonidas Guibas, Cewu Lu,
Abstract summary: PACE (Pose s in Cluttered Environments) is a large-scale benchmark for pose estimation methods in cluttered scenarios. The benchmark consists of 55K frames with 258K annotations across 300 videos, covering 238 objects from 43 categories. PACE-Sim contains 100K photo-realistic simulated frames with 2.4M annotations across 931 objects.
Score: 50.79058028754952
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce PACE (Pose Annotations in Cluttered Environments), a large-scale benchmark designed to advance the development and evaluation of pose estimation methods in cluttered scenarios. PACE provides a large-scale real-world benchmark for both instance-level and category-level settings. The benchmark consists of 55K frames with 258K annotations across 300 videos, covering 238 objects from 43 categories and featuring a mix of rigid and articulated items in cluttered scenes. To annotate the real-world data efficiently, we develop an innovative annotation system with a calibrated 3-camera setup. Additionally, we offer PACE-Sim, which contains 100K photo-realistic simulated frames with 2.4M annotations across 931 objects. We test state-of-the-art algorithms in PACE along two tracks: pose estimation, and object pose tracking, revealing the benchmark's challenges and research opportunities. Our benchmark code and data is available on https://github.com/qq456cvb/PACE.

Related papers

GraspClutter6D: A Large-scale Real-world Dataset for Robust Perception and Grasping in Cluttered Scenes [5.289647064481469]
We present GraspClutter6D, a large-scale real-world grasping dataset featuring 1,000 cluttered scenes with dense arrangements. We benchmark state-of-the-art segmentation, object pose estimation, and grasping detection methods to provide key insights into challenges in cluttered environments. We validate the dataset's effectiveness as a training resource, demonstrating that grasping networks trained on GraspClutter6D significantly outperform those trained on existing datasets in both simulation and real-world experiments.
arXiv Detail & Related papers (2025-04-09T13:15:46Z)
3D-Aware Instance Segmentation and Tracking in Egocentric Videos [107.10661490652822]
Egocentric videos present unique challenges for 3D scene understanding. This paper introduces a novel approach to instance segmentation and tracking in first-person video. By incorporating spatial and temporal cues, we achieve superior performance compared to state-of-the-art 2D approaches.
arXiv Detail & Related papers (2024-08-19T10:08:25Z)
TAPVid-3D: A Benchmark for Tracking Any Point in 3D [63.060421798990845]
We introduce a new benchmark, TAPVid-3D, for evaluating the task of Tracking Any Point in 3D. This benchmark will serve as a guidepost to improve our ability to understand precise 3D motion and surface deformation from monocular video.
arXiv Detail & Related papers (2024-07-08T13:28:47Z)
Omni6DPose: A Benchmark and Model for Universal 6D Object Pose Estimation and Tracking [9.365544189576363]
6D Object Pose Estimation is a crucial yet challenging task in computer vision, suffering from a significant lack of large-scale datasets. This paper introduces Omni6DPose, a dataset characterized by its diversity in object categories, large scale, and variety in object materials. We introduce GenPose++, an enhanced version of the SOTA category-level pose estimation framework, incorporating two pivotal improvements.
arXiv Detail & Related papers (2024-06-06T17:57:20Z)
PointOdyssey: A Large-Scale Synthetic Dataset for Long-Term Point Tracking [90.29143475328506]
We introduce PointOdyssey, a large-scale synthetic dataset, and data generation framework. Our goal is to advance the state-of-the-art by placing emphasis on long videos with naturalistic motion. We animate deformable characters using real-world motion capture data, we build 3D scenes to match the motion capture environments, and we render camera viewpoints using trajectories mined via structure-from-motion on real videos.
arXiv Detail & Related papers (2023-07-27T17:58:11Z)
A Neuromorphic Dataset for Object Segmentation in Indoor Cluttered Environment [3.6047642906482142]
This paper proposes a new Event-based ESD dataset for object segmentation in an indoor environment. Our proposed dataset comprises 145 sequences with 14,166 RGB frames that are manually annotated with instance masks. Overall 21.88 million and 20.80 million events from two event-based cameras in a stereo-graphic configuration are collected.
arXiv Detail & Related papers (2023-02-13T12:02:51Z)
Pose for Everything: Towards Category-Agnostic Pose Estimation [93.07415325374761]
Category-Agnostic Pose Estimation (CAPE) aims to create a pose estimation model capable of detecting the pose of any class of object given only a few samples with keypoint definition. A transformer-based Keypoint Interaction Module (KIM) is proposed to capture both the interactions among different keypoints and the relationship between the support and query images. We also introduce Multi-category Pose (MP-100) dataset, which is a 2D pose dataset of 100 object categories containing over 20K instances and is well-designed for developing CAPE algorithms.
arXiv Detail & Related papers (2022-07-21T09:40:54Z)
BCOT: A Markerless High-Precision 3D Object Tracking Benchmark [15.8625561193144]
We present a multi-view approach to estimate the accurate 3D poses of real moving objects, and then use binocular data to construct a new benchmark for monocular textureless 3D object tracking. Based on our object-centered model, we jointly optimize the object pose by minimizing shape re-projection constraints in all views. Our new benchmark dataset contains 20 textureless objects, 22 scenes, 404 video sequences and 126K images captured in real scenes.
arXiv Detail & Related papers (2022-03-25T03:55:03Z)
Benchmarking Unsupervised Object Representations for Video Sequences [111.81492107649889]
We compare the perceptual abilities of four object-centric approaches: ViMON, OP3, TBA and SCALOR. Our results suggest that the architectures with unconstrained latent representations learn more powerful representations in terms of object detection, segmentation and tracking. Our benchmark may provide fruitful guidance towards learning more robust object-centric video representations.
arXiv Detail & Related papers (2020-06-12T09:37:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.