Aria Digital Twin: A New Benchmark Dataset for Egocentric 3D Machine
Perception
- URL: http://arxiv.org/abs/2306.06362v2
- Date: Tue, 13 Jun 2023 06:38:47 GMT
- Title: Aria Digital Twin: A New Benchmark Dataset for Egocentric 3D Machine
Perception
- Authors: Xiaqing Pan, Nicholas Charron, Yongqian Yang, Scott Peters, Thomas
Whelan, Chen Kong, Omkar Parkhi, Richard Newcombe, Carl Yuheng Ren
- Abstract summary: Aria Digital Twin (ADT) is an egocentric dataset captured using Aria glasses.
ADT contains 200 sequences of real-world activities conducted by Aria wearers in two real indoor scenes.
- Score: 5.952224408665015
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We introduce the Aria Digital Twin (ADT) - an egocentric dataset captured
using Aria glasses with extensive object, environment, and human level ground
truth. This ADT release contains 200 sequences of real-world activities
conducted by Aria wearers in two real indoor scenes with 398 object instances
(324 stationary and 74 dynamic). Each sequence consists of: a) raw data of two
monochrome camera streams, one RGB camera stream, two IMU streams; b) complete
sensor calibration; c) ground truth data including continuous
6-degree-of-freedom (6DoF) poses of the Aria devices, object 6DoF poses, 3D eye
gaze vectors, 3D human poses, 2D image segmentations, image depth maps; and d)
photo-realistic synthetic renderings. To the best of our knowledge, there is no
existing egocentric dataset with a level of accuracy, photo-realism and
comprehensiveness comparable to ADT. By contributing ADT to the research
community, our mission is to set a new standard for evaluation in the
egocentric machine perception domain, which includes very challenging research
problems such as 3D object detection and tracking, scene reconstruction and
understanding, sim-to-real learning, human pose prediction - while also
inspiring new machine perception tasks for augmented reality (AR) applications.
To kick start exploration of the ADT research use cases, we evaluated several
existing state-of-the-art methods for object detection, segmentation and image
translation tasks that demonstrate the usefulness of ADT as a benchmarking
dataset.
Related papers
- EmbodMocap: In-the-Wild 4D Human-Scene Reconstruction for Embodied Agents [85.77432303199176]
We propose EmbodMocap, a portable and affordable data collection pipeline using two moving iPhones.<n>Our key idea is to jointly calibrate dual RGB-D sequences to reconstruct both humans and scenes.<n>Based on the collected data, we empower three embodied AI tasks: monocular human-scene-reconstruction, where we fine-tune feedforward models that output metric-scale, world-space aligned humans and scenes; physics-based character animation, where we prove our data could be used to scale human-object interaction skills and scene-aware motion tracking; and robot motion control, where we train a humanoid robot via
arXiv Detail & Related papers (2026-02-26T16:53:41Z) - Towards Scalable Spatial Intelligence via 2D-to-3D Data Lifting [64.64738535860351]
We present a scalable pipeline that converts single-view images into comprehensive, scale- and appearance-realistic 3D representations.<n>Our method bridges the gap between the vast repository of imagery and the increasing demand for spatial scene understanding.<n>By automatically generating authentic, scale-aware 3D data from images, we significantly reduce data collection costs and open new avenues for advancing spatial intelligence.
arXiv Detail & Related papers (2025-07-24T14:53:26Z) - MetaScenes: Towards Automated Replica Creation for Real-world 3D Scans [76.39726619818896]
Embodied AI (EAI) research requires high-quality, diverse 3D scenes to support skill acquisition, sim-to-real transfer, and generalization.<n>Existing datasets demonstrate that this process heavily relies on artist-driven designs.<n>We present MetaScenes, a large-scale, simulatable 3D scene dataset constructed from real-world scans.
arXiv Detail & Related papers (2025-05-05T06:13:25Z) - Digital Twin Catalog: A Large-Scale Photorealistic 3D Object Digital Twin Dataset [25.46046772152158]
Digital Twin Catalog is a new large-scale photorealistic 3D object digital twin dataset.
It features 2,000 scanned digital twin-quality 3D objects, along with image sequences captured under different lighting conditions using DSLR cameras and AR glasses.
arXiv Detail & Related papers (2025-04-11T13:54:19Z) - LWIRPOSE: A novel LWIR Thermal Image Dataset and Benchmark [9.679771580702258]
This dataset comprises over 2,400 high-quality LWIR (thermal) images.
Each image is meticulously annotated with 2D human poses, offering a valuable resource for researchers and practitioners.
We benchmark state-of-the-art pose estimation methods on the dataset to showcase its potential.
arXiv Detail & Related papers (2024-04-16T01:49:35Z) - VFMM3D: Releasing the Potential of Image by Vision Foundation Model for Monocular 3D Object Detection [80.62052650370416]
monocular 3D object detection holds significant importance across various applications, including autonomous driving and robotics.
In this paper, we present VFMM3D, an innovative framework that leverages the capabilities of Vision Foundation Models (VFMs) to accurately transform single-view images into LiDAR point cloud representations.
arXiv Detail & Related papers (2024-04-15T03:12:12Z) - Multi-Modal Dataset Acquisition for Photometrically Challenging Object [56.30027922063559]
This paper addresses the limitations of current datasets for 3D vision tasks in terms of accuracy, size, realism, and suitable imaging modalities for photometrically challenging objects.
We propose a novel annotation and acquisition pipeline that enhances existing 3D perception and 6D object pose datasets.
arXiv Detail & Related papers (2023-08-21T10:38:32Z) - 3D Data Augmentation for Driving Scenes on Camera [50.41413053812315]
We propose a 3D data augmentation approach termed Drive-3DAug, aiming at augmenting the driving scenes on camera in the 3D space.
We first utilize Neural Radiance Field (NeRF) to reconstruct the 3D models of background and foreground objects.
Then, augmented driving scenes can be obtained by placing the 3D objects with adapted location and orientation at the pre-defined valid region of backgrounds.
arXiv Detail & Related papers (2023-03-18T05:51:05Z) - OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic
Perception, Reconstruction and Generation [107.71752592196138]
We propose OmniObject3D, a large vocabulary 3D object dataset with massive high-quality real-scanned 3D objects.
It comprises 6,000 scanned objects in 190 daily categories, sharing common classes with popular 2D datasets.
Each 3D object is captured with both 2D and 3D sensors, providing textured meshes, point clouds, multiview rendered images, and multiple real-captured videos.
arXiv Detail & Related papers (2023-01-18T18:14:18Z) - Aerial Monocular 3D Object Detection [46.26215100532241]
This work proposes a dual-view detection system named DVDET to achieve aerial monocular object detection in both the 2D image space and the 3D physical space.
To address the dataset challenge, we propose a new large-scale simulation dataset named AM3D-Sim, generated by the co-simulation of AirSIM and CARLA, and a new real-world aerial dataset named AM3D-Real, collected by DJI Matrice 300 RTK.
arXiv Detail & Related papers (2022-08-08T08:32:56Z) - A Multi-purpose Real Haze Benchmark with Quantifiable Haze Levels and
Ground Truth [61.90504318229845]
This paper introduces the first paired real image benchmark dataset with hazy and haze-free images, and in-situ haze density measurements.
This dataset was produced in a controlled environment with professional smoke generating machines that covered the entire scene.
A subset of this dataset has been used for the Object Detection in Haze Track of CVPR UG2 2022 challenge.
arXiv Detail & Related papers (2022-06-13T19:14:06Z) - 6-DoF Pose Estimation of Household Objects for Robotic Manipulation: An
Accessible Dataset and Benchmark [17.493403705281008]
We present a new dataset for 6-DoF pose estimation of known objects, with a focus on robotic manipulation research.
We provide 3D scanned textured models of toy grocery objects, as well as RGBD images of the objects in challenging, cluttered scenes.
Using semi-automated RGBD-to-model texture correspondences, the images are annotated with ground truth poses that were verified empirically to be accurate to within a few millimeters.
We also propose a new pose evaluation metric called ADD-H based upon the Hungarian assignment algorithm that is robust to symmetries in object geometry without requiring their explicit enumeration.
arXiv Detail & Related papers (2022-03-11T01:19:04Z) - Ground material classification and for UAV-based photogrammetric 3D data
A 2D-3D Hybrid Approach [1.3359609092684614]
In recent years, photogrammetry has been widely used in many areas to create 3D virtual data representing the physical environment.
These cutting-edge technologies have caught the US Army and Navy's attention for the purpose of rapid 3D battlefield reconstruction, virtual training, and simulations.
arXiv Detail & Related papers (2021-09-24T22:29:26Z) - EAGLE: Large-scale Vehicle Detection Dataset in Real-World Scenarios
using Aerial Imagery [3.8902657229395894]
We introduce a large-scale dataset for multi-class vehicle detection with object orientation information in aerial imagery.
It features high-resolution aerial images composed of different real-world situations with a wide variety of camera sensor, resolution, flight altitude, weather, illumination, haze, shadow, time, city, country, occlusion, and camera angle.
It contains 215,986 instances annotated with oriented bounding boxes defined by four points and orientation, making it by far the largest dataset to date in this task.
It also supports researches on the haze and shadow removal as well as super-resolution and in-painting applications.
arXiv Detail & Related papers (2020-07-12T23:00:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.