A Dense Material Segmentation Dataset for Indoor and Outdoor Scene
Parsing
- URL: http://arxiv.org/abs/2207.10614v1
- Date: Thu, 21 Jul 2022 17:15:41 GMT
- Title: A Dense Material Segmentation Dataset for Indoor and Outdoor Scene
Parsing
- Authors: Paul Upchurch and Ransen Niu
- Abstract summary: We propose a large-scale dataset of 3.2 million dense segments on 44,560 indoor and outdoor images.
Our data covers a more diverse set of scenes, objects, viewpoints and materials.
We show that a model trained on our data outperforms a state-of-the-art model across datasets and viewpoints.
- Score: 1.7404865362620798
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A key algorithm for understanding the world is material segmentation, which
assigns a label (metal, glass, etc.) to each pixel. We find that a model
trained on existing data underperforms in some settings and propose to address
this with a large-scale dataset of 3.2 million dense segments on 44,560 indoor
and outdoor images, which is 23x more segments than existing data. Our data
covers a more diverse set of scenes, objects, viewpoints and materials, and
contains a more fair distribution of skin types. We show that a model trained
on our data outperforms a state-of-the-art model across datasets and
viewpoints. We propose a large-scale scene parsing benchmark and baseline of
0.729 per-pixel accuracy, 0.585 mean class accuracy and 0.420 mean IoU across
46 materials.
Related papers
- SAM 2: Segment Anything in Images and Videos [63.44869623822368]
We present Segment Anything Model 2 (SAM 2), a foundation model towards solving promptable visual segmentation in images and videos.
We build a data engine, which improves model and data via user interaction, to collect the largest video segmentation dataset to date.
Our model is a simple transformer architecture with streaming memory for real-time video processing.
arXiv Detail & Related papers (2024-08-01T17:00:08Z) - From Pixels to Prose: A Large Dataset of Dense Image Captions [76.97493750144812]
PixelProse is a comprehensive dataset of over 16M (million) synthetically generated captions.
To ensure data integrity, we rigorously analyze our dataset for problematic content.
We also provide valuable metadata such as watermark presence and aesthetic scores.
arXiv Detail & Related papers (2024-06-14T17:59:53Z) - PACE: A Large-Scale Dataset with Pose Annotations in Cluttered Environments [50.79058028754952]
PACE (Pose s in Cluttered Environments) is a large-scale benchmark for pose estimation methods in cluttered scenarios.
The benchmark consists of 55K frames with 258K annotations across 300 videos, covering 238 objects from 43 categories.
PACE-Sim contains 100K photo-realistic simulated frames with 2.4M annotations across 931 objects.
arXiv Detail & Related papers (2023-12-23T01:38:41Z) - DatasetDM: Synthesizing Data with Perception Annotations Using Diffusion
Models [61.906934570771256]
We present a generic dataset generation model that can produce diverse synthetic images and perception annotations.
Our method builds upon the pre-trained diffusion model and extends text-guided image synthesis to perception data generation.
We show that the rich latent code of the diffusion model can be effectively decoded as accurate perception annotations using a decoder module.
arXiv Detail & Related papers (2023-08-11T14:38:11Z) - MSeg: A Composite Dataset for Multi-domain Semantic Segmentation [100.17755160696939]
We present MSeg, a composite dataset that unifies semantic segmentation datasets from different domains.
We reconcile the generalization and bring the pixel-level annotations into alignment by relabeling more than 220,000 object masks in more than 80,000 images.
A model trained on MSeg ranks first on the WildDash-v1 leaderboard for robust semantic segmentation, with no exposure to WildDash data during training.
arXiv Detail & Related papers (2021-12-27T16:16:35Z) - Adapting to Unseen Vendor Domains for MRI Lesion Segmentation [0.08156494881838945]
We investigate an unsupervised image translation model to augment MR images from a source dataset to a target dataset.
We consider three configurations of augmentation between datasets consisting of translation between images, between scanner vendors, and from labels to images.
It was found that the segmentation models trained on synthetic data from labels to images configuration yielded the closest performance to the segmentation model trained directly on the target dataset.
arXiv Detail & Related papers (2021-08-14T01:09:43Z) - DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort [117.41383937100751]
Current deep networks are extremely data-hungry, benefiting from training on large-scale datasets.
We show how the GAN latent code can be decoded to produce a semantic segmentation of the image.
These generated datasets can then be used for training any computer vision architecture just as real datasets are.
arXiv Detail & Related papers (2021-04-13T20:08:29Z) - Learning from THEODORE: A Synthetic Omnidirectional Top-View Indoor
Dataset for Deep Transfer Learning [4.297070083645049]
We introduce THEODORE: a novel, large-scale indoor dataset containing 100,000 high-resolution diversified fisheye images with 14 classes.
We create 3D virtual environments of living rooms, different human characters and interior textures.
We show that our dataset is well suited for fine-tuning CNNs for object detection.
arXiv Detail & Related papers (2020-11-11T11:46:33Z) - Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene
Understanding [8.720130442653575]
Hypersim is a synthetic dataset for holistic indoor scene understanding.
We generate 77,400 images of 461 indoor scenes with detailed per-pixel labels and corresponding ground truth geometry.
arXiv Detail & Related papers (2020-11-04T20:12:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.