EDEN: Multimodal Synthetic Dataset of Enclosed GarDEN Scenes
- URL: http://arxiv.org/abs/2011.04389v2
- Date: Tue, 10 Nov 2020 20:11:31 GMT
- Title: EDEN: Multimodal Synthetic Dataset of Enclosed GarDEN Scenes
- Authors: Hoang-An Le, Thomas Mensink, Partha Das, Sezer Karaoglu, Theo Gevers
- Abstract summary: This dataset features more than 300K images captured from more than 100 garden models.
Each image is annotated with various low/high-level vision modalities, including semantic segmentation, depth, surface normals, intrinsic colors, and optical flow.
Experimental results on the state-of-the-art methods for semantic segmentation and monocular depth prediction, two important tasks in computer vision, show positive impact of pre-training deep networks on our dataset for unstructured natural scenes.
- Score: 21.695100437184507
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal large-scale datasets for outdoor scenes are mostly designed for
urban driving problems. The scenes are highly structured and semantically
different from scenarios seen in nature-centered scenes such as gardens or
parks. To promote machine learning methods for nature-oriented applications,
such as agriculture and gardening, we propose the multimodal synthetic dataset
for Enclosed garDEN scenes (EDEN). The dataset features more than 300K images
captured from more than 100 garden models. Each image is annotated with various
low/high-level vision modalities, including semantic segmentation, depth,
surface normals, intrinsic colors, and optical flow. Experimental results on
the state-of-the-art methods for semantic segmentation and monocular depth
prediction, two important tasks in computer vision, show positive impact of
pre-training deep networks on our dataset for unstructured natural scenes. The
dataset and related materials will be available at
https://lhoangan.github.io/eden.
Related papers
- FIORD: A Fisheye Indoor-Outdoor Dataset with LIDAR Ground Truth for 3D Scene Reconstruction and Benchmarking [29.634708606525727]
We introduce a fisheye image dataset tailored for scene reconstruction tasks.
Using dual 200-degree fisheye lenses, our dataset provides full 360-degree coverage of 5 indoor and 5 outdoor scenes.
Each scene has sparse SfM point clouds and precise LIDAR-derived dense point clouds that can be used as geometric ground-truth.
arXiv Detail & Related papers (2025-04-02T13:41:23Z) - Seeing and Seeing Through the Glass: Real and Synthetic Data for Multi-Layer Depth Estimation [18.8622645280467]
LayeredDepth is the first dataset with multi-layer depth annotations, including a real-world benchmark and a synthetic data generator.
Our benchmark consists of 1,500 images from diverse scenes, and evaluating state-of-the-art depth estimation methods on it reveals that they struggle with transparent objects.
Baseline models training solely on this synthetic dataset produce good cross-domain multi-layer depth estimation.
arXiv Detail & Related papers (2025-03-14T17:52:06Z) - MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High-Intensity Surgical Environments [49.45034796115852]
Operating rooms (ORs) are complex, high-stakes environments requiring precise understanding of interactions among medical staff, tools, and equipment.
Current datasets fall short in scale, realism and do not capture the nature of OR scenes, limiting multimodal in OR modeling.
We introduce MM-OR, a realistic and large-scale multimodal OR dataset, and first dataset to enable multimodal scene graph generation.
arXiv Detail & Related papers (2025-03-04T13:00:52Z) - EarthView: A Large Scale Remote Sensing Dataset for Self-Supervision [72.84868704100595]
This paper presents a dataset specifically designed for self-supervision on remote sensing data, intended to enhance deep learning applications on Earth monitoring tasks.
The dataset spans 15 tera pixels of global remote-sensing data, combining imagery from a diverse range of sources, including NEON, Sentinel, and a novel release of 1m spatial resolution data from Satellogic.
Accompanying the dataset is EarthMAE, a tailored Masked Autoencoder developed to tackle the distinct challenges of remote sensing data.
arXiv Detail & Related papers (2025-01-14T13:42:22Z) - WayveScenes101: A Dataset and Benchmark for Novel View Synthesis in Autonomous Driving [4.911903454560829]
WayveScenes101 is a dataset designed to help the community advance the state of the art in novel view synthesis.
The dataset comprises 101 driving scenes across a wide range of environmental conditions and driving scenarios.
arXiv Detail & Related papers (2024-07-11T08:29:45Z) - 360 in the Wild: Dataset for Depth Prediction and View Synthesis [66.58513725342125]
We introduce a large scale 360$circ$ videos dataset in the wild.
This dataset has been carefully scraped from the Internet and has been captured from various locations worldwide.
Each of the 25K images constituting our dataset is provided with its respective camera's pose and depth map.
arXiv Detail & Related papers (2024-06-27T05:26:38Z) - Forest Inspection Dataset for Aerial Semantic Segmentation and Depth
Estimation [6.635604919499181]
We introduce a new large aerial dataset for forest inspection.
It contains both real-world and virtual recordings of natural environments.
We develop a framework to assess the deforestation degree of an area.
arXiv Detail & Related papers (2024-03-11T11:26:44Z) - SAMPLING: Scene-adaptive Hierarchical Multiplane Images Representation
for Novel View Synthesis from a Single Image [60.52991173059486]
We introduce SAMPLING, a Scene-adaptive Hierarchical Multiplane Images Representation for Novel View Synthesis from a Single Image.
Our method demonstrates considerable performance gains in large-scale unbounded outdoor scenes using a single image on the KITTI dataset.
arXiv Detail & Related papers (2023-09-12T15:33:09Z) - NPF-200: A Multi-Modal Eye Fixation Dataset and Method for
Non-Photorealistic Videos [51.409547544747284]
NPF-200 is the first large-scale multi-modal dataset of purely non-photorealistic videos with eye fixations.
We conduct a series of analyses to gain deeper insights into this task.
We propose a universal frequency-aware multi-modal non-photorealistic saliency detection model called NPSNet.
arXiv Detail & Related papers (2023-08-23T14:25:22Z) - Habitat Synthetic Scenes Dataset (HSSD-200): An Analysis of 3D Scene
Scale and Realism Tradeoffs for ObjectGoal Navigation [70.82403156865057]
We investigate the impact of synthetic 3D scene dataset scale and realism on the task of training embodied agents to find and navigate to objects.
Our experiments show that agents trained on our smaller-scale dataset can match or outperform agents trained on much larger datasets.
arXiv Detail & Related papers (2023-06-20T05:07:23Z) - VDD: Varied Drone Dataset for Semantic Segmentation [9.581655974280217]
We release a large-scale, densely labeled collection of 400 high-resolution images spanning 7 classes.
This dataset features various scenes in urban, industrial, rural, and natural areas, captured from different camera angles and under diverse lighting conditions.
We train seven state-of-the-art models on drone datasets as baselines.
arXiv Detail & Related papers (2023-05-23T02:16:14Z) - PanDepth: Joint Panoptic Segmentation and Depth Completion [19.642115764441016]
We propose a multi-task model for panoptic segmentation and depth completion using RGB images and sparse depth maps.
Our model successfully predicts fully dense depth maps and performs semantic segmentation, instance segmentation, and panoptic segmentation for every input frame.
arXiv Detail & Related papers (2022-12-29T05:37:38Z) - TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual
Environments [84.6017003787244]
This work proposes a synthetic data generation pipeline to address the difficulties and domain-gaps present in simulated datasets.
We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation.
arXiv Detail & Related papers (2022-08-16T20:46:08Z) - DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort [117.41383937100751]
Current deep networks are extremely data-hungry, benefiting from training on large-scale datasets.
We show how the GAN latent code can be decoded to produce a semantic segmentation of the image.
These generated datasets can then be used for training any computer vision architecture just as real datasets are.
arXiv Detail & Related papers (2021-04-13T20:08:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.