Virtual KITTI 2
- URL: http://arxiv.org/abs/2001.10773v1
- Date: Wed, 29 Jan 2020 12:13:20 GMT
- Title: Virtual KITTI 2
- Authors: Yohann Cabon, Naila Murray, Martin Humenberger
- Abstract summary: This paper introduces an updated version of the well-known Virtual KITTI dataset.
The dataset consists of 5 sequence clones from the KITTI tracking benchmark.
For each sequence, we provide multiple sets of images containing RGB, depth, class segmentation, instance segmentation, flow, and scene flow data.
- Score: 13.390646987475163
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper introduces an updated version of the well-known Virtual KITTI
dataset which consists of 5 sequence clones from the KITTI tracking benchmark.
In addition, the dataset provides different variants of these sequences such as
modified weather conditions (e.g. fog, rain) or modified camera configurations
(e.g. rotated by 15 degrees). For each sequence, we provide multiple sets of
images containing RGB, depth, class segmentation, instance segmentation, flow,
and scene flow data. Camera parameters and poses as well as vehicle locations
are available as well. In order to showcase some of the dataset's capabilities,
we ran multiple relevant experiments using state-of-the-art algorithms from the
field of autonomous driving. The dataset is available for download at
https://europe.naverlabs.com/Research/Computer-Vision/Proxy-Virtual-Worlds.
Related papers
- NWPU-MOC: A Benchmark for Fine-grained Multi-category Object Counting in
Aerial Images [64.92809155168595]
This paper introduces a Multi-category Object Counting task to estimate the numbers of different objects in an aerial image.
Considering the absence of a dataset for this task, a large-scale dataset is collected, consisting of 3,416 scenes with a resolution of 1024 $times$ 1024 pixels.
The paper presents a multi-spectrum, multi-category object counting framework, which employs a dual-attention module to fuse the features of RGB and NIR.
arXiv Detail & Related papers (2024-01-19T07:12:36Z) - CoVR: Learning Composed Video Retrieval from Web Video Captions [59.854331104466254]
Composed Image Retrieval (CoIR) has recently gained popularity as a task that considers both text and image queries together.
We propose a scalable automatic dataset creation methodology that generates triplets given video-caption pairs.
We also expand the scope of the task to include composed video retrieval (CoVR)
arXiv Detail & Related papers (2023-08-28T17:55:33Z) - ARKitTrack: A New Diverse Dataset for Tracking Using Mobile RGB-D Data [75.73063721067608]
We propose a new RGB-D tracking dataset for both static and dynamic scenes captured by consumer-grade LiDAR scanners equipped on Apple's iPhone and iPad.
ARKitTrack contains 300 RGB-D sequences, 455 targets, and 229.7K video frames in total.
In-depth empirical analysis has verified that the ARKitTrack dataset can significantly facilitate RGB-D tracking and that the proposed baseline method compares favorably against the state of the arts.
arXiv Detail & Related papers (2023-03-24T09:51:13Z) - SUPS: A Simulated Underground Parking Scenario Dataset for Autonomous
Driving [41.221988979184665]
SUPS is a simulated dataset for underground automatic parking.
It supports multiple tasks with multiple sensors and multiple semantic labels aligned with successive images.
We also evaluate the state-of-the-art SLAM algorithms and perception models on our dataset.
arXiv Detail & Related papers (2023-02-25T02:59:12Z) - Argoverse 2: Next Generation Datasets for Self-Driving Perception and
Forecasting [64.7364925689825]
Argoverse 2 (AV2) is a collection of three datasets for perception and forecasting research in the self-driving domain.
The Lidar dataset contains 20,000 sequences of unlabeled lidar point clouds and map-aligned pose.
The Motion Forecasting dataset contains 250,000 scenarios mined for interesting and challenging interactions between the autonomous vehicle and other actors in each local scene.
arXiv Detail & Related papers (2023-01-02T00:36:22Z) - HPointLoc: Point-based Indoor Place Recognition using Synthetic RGB-D
Images [58.720142291102135]
We present a novel dataset named as HPointLoc, specially designed for exploring capabilities of visual place recognition in indoor environment.
The dataset is based on the popular Habitat simulator, in which it is possible to generate indoor scenes using both own sensor data and open datasets.
arXiv Detail & Related papers (2022-12-30T12:20:56Z) - Scale Invariant Semantic Segmentation with RGB-D Fusion [12.650574326251023]
We propose a neural network architecture for scale-invariant semantic segmentation using RGB-D images.
We incorporate depth information to the RGB data for pixel-wise semantic segmentation to address the different scale objects in an outdoor scene.
Our model is compact and can be easily applied to the other RGB model.
arXiv Detail & Related papers (2022-04-10T12:54:27Z) - Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline [80.13652104204691]
In this paper, we construct a large-scale benchmark with high diversity for visible-thermal UAV tracking (VTUAV)
We provide a coarse-to-fine attribute annotation, where frame-level attributes are provided to exploit the potential of challenge-specific trackers.
In addition, we design a new RGB-T baseline, named Hierarchical Multi-modal Fusion Tracker (HMFT), which fuses RGB-T data in various levels.
arXiv Detail & Related papers (2022-04-08T15:22:33Z) - TICaM: A Time-of-flight In-car Cabin Monitoring Dataset [10.845284058153837]
TICaM is a Time-of-flight In-car Cabin Monitoring dataset for vehicle interior monitoring using a single wide-angle depth camera.
We record an exhaustive list of actions performed while driving and provide for them multi-modal labeled images.
Additional to real recordings, we provide a synthetic dataset of in-car cabin images with same multi-modality of images and annotations.
arXiv Detail & Related papers (2021-03-22T10:48:45Z) - Learning to Sort Image Sequences via Accumulated Temporal Differences [27.41266294612776]
We tackle the problem of temporally sequencing the unordered set of images of a dynamic scene captured with a hand-held camera.
We propose a convolutional block which captures the spatial information through 2D convolution kernel.
We show that the proposed approach outperforms the state-of-the-art methods by a significant margin.
arXiv Detail & Related papers (2020-10-22T12:34:05Z) - Robust Image Retrieval-based Visual Localization using Kapture [10.249293519246478]
We present a versatile pipeline for visual localization that facilitates the use of different local and global features.
We evaluate our methods on eight public datasets where they rank top on all and first on many of them.
To foster future research, we release code, models, and all datasets used in this paper in the kapture format open source under a permissive BSD license.
arXiv Detail & Related papers (2020-07-27T21:10:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.