Waymo Open Dataset: Panoramic Video Panoptic Segmentation
- URL: http://arxiv.org/abs/2206.07704v1
- Date: Wed, 15 Jun 2022 17:57:28 GMT
- Title: Waymo Open Dataset: Panoramic Video Panoptic Segmentation
- Authors: Jieru Mei, Alex Zihao Zhu, Xinchen Yan, Hang Yan, Siyuan Qiao, Yukun
Zhu, Liang-Chieh Chen, Henrik Kretzschmar, Dragomir Anguelov
- Abstract summary: Research in image segmentation has become increasingly popular due to its critical applications in robotics and autonomous driving.
Due to the high costs of densely labeling the images, there is a shortage of publicly available ground truth labels.
We present a large-scale dataset that offers high-quality panoptic segmentation labels for autonomous driving.
- Score: 48.04664130918314
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Panoptic image segmentation is the computer vision task of finding groups of
pixels in an image and assigning semantic classes and object instance
identifiers to them. Research in image segmentation has become increasingly
popular due to its critical applications in robotics and autonomous driving.
The research community thereby relies on publicly available benchmark dataset
to advance the state-of-the-art in computer vision. Due to the high costs of
densely labeling the images, however, there is a shortage of publicly available
ground truth labels that are suitable for panoptic segmentation. The high
labeling costs also make it challenging to extend existing datasets to the
video domain and to multi-camera setups. We therefore present the Waymo Open
Dataset: Panoramic Video Panoptic Segmentation Dataset, a large-scale dataset
that offers high-quality panoptic segmentation labels for autonomous driving.
We generate our dataset using the publicly available Waymo Open Dataset,
leveraging the diverse set of camera images. Our labels are consistent over
time for video processing and consistent across multiple cameras mounted on the
vehicles for full panoramic scene understanding. Specifically, we offer labels
for 28 semantic categories and 2,860 temporal sequences that were captured by
five cameras mounted on autonomous vehicles driving in three different
geographical locations, leading to a total of 100k labeled camera images. To
the best of our knowledge, this makes our dataset an order of magnitude larger
than existing datasets that offer video panoptic segmentation labels. We
further propose a new benchmark for Panoramic Video Panoptic Segmentation and
establish a number of strong baselines based on the DeepLab family of models.
We will make the benchmark and the code publicly available. Find the dataset at
https://waymo.com/open.
Related papers
- Panonut360: A Head and Eye Tracking Dataset for Panoramic Video [0.0]
We present a head and eye tracking dataset involving 50 users watching 15 panoramic videos.
The dataset provides details on the viewport and gaze attention locations of users.
Our analysis reveals a consistent downward offset in gaze fixations relative to the Field of View.
arXiv Detail & Related papers (2024-03-26T13:54:52Z) - PanoVOS: Bridging Non-panoramic and Panoramic Views with Transformer for Video Segmentation [39.269864548255576]
We present a panoramic video dataset, PanoVOS.
The dataset provides 150 videos with high video resolutions and diverse motions.
We present a Panoramic Space Consistency Transformer (PSCFormer) which can effectively utilize the semantic boundary information of the previous frame for pixel-level matching with the current frame.
arXiv Detail & Related papers (2023-09-21T17:59:02Z) - SUPS: A Simulated Underground Parking Scenario Dataset for Autonomous
Driving [41.221988979184665]
SUPS is a simulated dataset for underground automatic parking.
It supports multiple tasks with multiple sensors and multiple semantic labels aligned with successive images.
We also evaluate the state-of-the-art SLAM algorithms and perception models on our dataset.
arXiv Detail & Related papers (2023-02-25T02:59:12Z) - Synthehicle: Multi-Vehicle Multi-Camera Tracking in Virtual Cities [4.4855664250147465]
We present a massive synthetic dataset for multiple vehicle tracking and segmentation in multiple overlapping and non-overlapping camera views.
The dataset consists of 17 hours of labeled video material, recorded from 340 cameras in 64 diverse day, rain, dawn, and night scenes.
arXiv Detail & Related papers (2022-08-30T11:36:07Z) - BigDatasetGAN: Synthesizing ImageNet with Pixel-wise Annotations [89.42397034542189]
We synthesize a large labeled dataset via a generative adversarial network (GAN)
We take image samples from the class-conditional generative model BigGAN trained on ImageNet, and manually annotate 5 images per class, for all 1k classes.
We create a new ImageNet benchmark by labeling an additional set of 8k real images and evaluate segmentation performance in a variety of settings.
arXiv Detail & Related papers (2022-01-12T20:28:34Z) - Reducing the Annotation Effort for Video Object Segmentation Datasets [50.893073670389164]
densely labeling every frame with pixel masks does not scale to large datasets.
We use a deep convolutional network to automatically create pseudo-labels on a pixel level from much cheaper bounding box annotations.
We obtain the new TAO-VOS benchmark, which we make publicly available at www.vision.rwth-aachen.de/page/taovos.
arXiv Detail & Related papers (2020-11-02T17:34:45Z) - Labelling unlabelled videos from scratch with multi-modal
self-supervision [82.60652426371936]
unsupervised labelling of a video dataset does not come for free from strong feature encoders.
We propose a novel clustering method that allows pseudo-labelling of a video dataset without any human annotations.
An extensive analysis shows that the resulting clusters have high semantic overlap to ground truth human labels.
arXiv Detail & Related papers (2020-06-24T12:28:17Z) - Video Panoptic Segmentation [117.08520543864054]
We propose and explore a new video extension of this task, called video panoptic segmentation.
To invigorate research on this new task, we present two types of video panoptic datasets.
We propose a novel video panoptic segmentation network (VPSNet) which jointly predicts object classes, bounding boxes, masks, instance id tracking, and semantic segmentation in video frames.
arXiv Detail & Related papers (2020-06-19T19:35:47Z) - TAO: A Large-Scale Benchmark for Tracking Any Object [95.87310116010185]
Tracking Any Object dataset consists of 2,907 high resolution videos, captured in diverse environments, which are half a minute long on average.
We ask annotators to label objects that move at any point in the video, and give names to them post factum.
Our vocabulary is both significantly larger and qualitatively different from existing tracking datasets.
arXiv Detail & Related papers (2020-05-20T21:07:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.