Seeing and Seeing Through the Glass: Real and Synthetic Data for Multi-Layer Depth Estimation
- URL: http://arxiv.org/abs/2503.11633v1
- Date: Fri, 14 Mar 2025 17:52:06 GMT
- Title: Seeing and Seeing Through the Glass: Real and Synthetic Data for Multi-Layer Depth Estimation
- Authors: Hongyu Wen, Yiming Zuo, Venkat Subramanian, Patrick Chen, Jia Deng,
- Abstract summary: LayeredDepth is the first dataset with multi-layer depth annotations, including a real-world benchmark and a synthetic data generator.<n>Our benchmark consists of 1,500 images from diverse scenes, and evaluating state-of-the-art depth estimation methods on it reveals that they struggle with transparent objects.<n> Baseline models training solely on this synthetic dataset produce good cross-domain multi-layer depth estimation.
- Score: 18.8622645280467
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transparent objects are common in daily life, and understanding their multi-layer depth information -- perceiving both the transparent surface and the objects behind it -- is crucial for real-world applications that interact with transparent materials. In this paper, we introduce LayeredDepth, the first dataset with multi-layer depth annotations, including a real-world benchmark and a synthetic data generator, to support the task of multi-layer depth estimation. Our real-world benchmark consists of 1,500 images from diverse scenes, and evaluating state-of-the-art depth estimation methods on it reveals that they struggle with transparent objects. The synthetic data generator is fully procedural and capable of providing training data for this task with an unlimited variety of objects and scene compositions. Using this generator, we create a synthetic dataset with 15,300 images. Baseline models training solely on this synthetic dataset produce good cross-domain multi-layer depth estimation. Fine-tuning state-of-the-art single-layer depth models on it substantially improves their performance on transparent objects, with quadruplet accuracy on our benchmark increased from 55.14% to 75.20%. All images and validation annotations are available under CC0 at https://layereddepth.cs.princeton.edu.
Related papers
- Monocular Depth Estimation and Segmentation for Transparent Object with Iterative Semantic and Geometric Fusion [9.391182087420926]
We propose a monocular framework, which is the first to excel in both segmentation and depth estimation of transparent objects.<n>Specifically, we devise a novel semantic and geometric fusion module, effectively integrating the multi-scale information between tasks.<n> Experiments on two challenging synthetic and real-world datasets demonstrate that our model surpasses state-of-the-art monocular, stereo, and multi-view methods by a large margin.
arXiv Detail & Related papers (2025-02-20T14:57:01Z) - Virtually Enriched NYU Depth V2 Dataset for Monocular Depth Estimation: Do We Need Artificial Augmentation? [61.234412062595155]
We present ANYU, a new virtually augmented version of the NYU depth v2 dataset, designed for monocular depth estimation.
In contrast to the well-known approach where full 3D scenes of a virtual world are utilized to generate artificial datasets, ANYU was created by incorporating RGB-D representations of virtual reality objects.
We show that ANYU improves the monocular depth estimation performance and generalization of deep neural networks with considerably different architectures.
arXiv Detail & Related papers (2024-04-15T05:44:03Z) - Habitat Synthetic Scenes Dataset (HSSD-200): An Analysis of 3D Scene
Scale and Realism Tradeoffs for ObjectGoal Navigation [70.82403156865057]
We investigate the impact of synthetic 3D scene dataset scale and realism on the task of training embodied agents to find and navigate to objects.
Our experiments show that agents trained on our smaller-scale dataset can match or outperform agents trained on much larger datasets.
arXiv Detail & Related papers (2023-06-20T05:07:23Z) - MetaGraspNet: A Large-Scale Benchmark Dataset for Scene-Aware
Ambidextrous Bin Picking via Physics-based Metaverse Synthesis [72.85526892440251]
We introduce MetaGraspNet, a large-scale photo-realistic bin picking dataset constructed via physics-based metaverse synthesis.
The proposed dataset contains 217k RGBD images across 82 different article types, with full annotations for object detection, amodal perception, keypoint detection, manipulation order and ambidextrous grasp labels for a parallel-jaw and vacuum gripper.
We also provide a real dataset consisting of over 2.3k fully annotated high-quality RGBD images, divided into 5 levels of difficulties and an unseen object set to evaluate different object and layout properties.
arXiv Detail & Related papers (2022-08-08T08:15:34Z) - RTMV: A Ray-Traced Multi-View Synthetic Dataset for Novel View Synthesis [104.53930611219654]
We present a large-scale synthetic dataset for novel view synthesis consisting of 300k images rendered from nearly 2000 complex scenes.
The dataset is orders of magnitude larger than existing synthetic datasets for novel view synthesis.
Using 4 distinct sources of high-quality 3D meshes, the scenes of our dataset exhibit challenging variations in camera views, lighting, shape, materials, and textures.
arXiv Detail & Related papers (2022-05-14T13:15:32Z) - TransCG: A Large-Scale Real-World Dataset for Transparent Object Depth
Completion and Grasping [46.6058840385155]
We contribute a large-scale real-world dataset for transparent object depth completion.
Our dataset contains 57,715 RGB-D images from 130 different scenes.
We propose an end-to-end depth completion network, which takes the RGB image and the inaccurate depth map as inputs and outputs a refined depth map.
arXiv Detail & Related papers (2022-02-17T06:50:20Z) - Seeing Glass: Joint Point Cloud and Depth Completion for Transparent
Objects [16.714074893209713]
TranspareNet is a joint point cloud and depth completion method.
It can complete the depth of transparent objects in cluttered and complex scenes.
TranspareNet outperforms existing state-of-the-art depth completion methods on multiple datasets.
arXiv Detail & Related papers (2021-09-30T21:09:09Z) - Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets.
This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets.
We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z) - Learning from THEODORE: A Synthetic Omnidirectional Top-View Indoor
Dataset for Deep Transfer Learning [4.297070083645049]
We introduce THEODORE: a novel, large-scale indoor dataset containing 100,000 high-resolution diversified fisheye images with 14 classes.
We create 3D virtual environments of living rooms, different human characters and interior textures.
We show that our dataset is well suited for fine-tuning CNNs for object detection.
arXiv Detail & Related papers (2020-11-11T11:46:33Z) - EDEN: Multimodal Synthetic Dataset of Enclosed GarDEN Scenes [21.695100437184507]
This dataset features more than 300K images captured from more than 100 garden models.
Each image is annotated with various low/high-level vision modalities, including semantic segmentation, depth, surface normals, intrinsic colors, and optical flow.
Experimental results on the state-of-the-art methods for semantic segmentation and monocular depth prediction, two important tasks in computer vision, show positive impact of pre-training deep networks on our dataset for unstructured natural scenes.
arXiv Detail & Related papers (2020-11-09T12:44:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.