WildScenes: A Benchmark for 2D and 3D Semantic Segmentation in
Large-scale Natural Environments
- URL: http://arxiv.org/abs/2312.15364v1
- Date: Sat, 23 Dec 2023 22:27:40 GMT
- Title: WildScenes: A Benchmark for 2D and 3D Semantic Segmentation in
Large-scale Natural Environments
- Authors: Kavisha Vidanapathirana, Joshua Knights, Stephen Hausler, Mark Cox,
Milad Ramezani, Jason Jooste, Ethan Griffiths, Shaheer Mohamed, Sridha
Sridharan, Clinton Fookes and Peyman Moghadam
- Abstract summary: We introduce WildScenes, a bi-modal benchmark dataset consisting of multiple large-scales in natural environments.
The data is trajectory-centric with accurate localization and globally aligned point clouds.
We introduce benchmarks on 2D and 3D semantic segmentation and evaluate a variety of recent deep-learning techniques.
- Score: 34.24004079703609
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent progress in semantic scene understanding has primarily been enabled by
the availability of semantically annotated bi-modal (camera and lidar) datasets
in urban environments. However, such annotated datasets are also needed for
natural, unstructured environments to enable semantic perception for
applications, including conservation, search and rescue, environment
monitoring, and agricultural automation. Therefore, we introduce WildScenes, a
bi-modal benchmark dataset consisting of multiple large-scale traversals in
natural environments, including semantic annotations in high-resolution 2D
images and dense 3D lidar point clouds, and accurate 6-DoF pose information.
The data is (1) trajectory-centric with accurate localization and globally
aligned point clouds, (2) calibrated and synchronized to support bi-modal
inference, and (3) containing different natural environments over 6 months to
support research on domain adaptation. Our 3D semantic labels are obtained via
an efficient automated process that transfers the human-annotated 2D labels
from multiple views into 3D point clouds, thus circumventing the need for
expensive and time-consuming human annotation in 3D. We introduce benchmarks on
2D and 3D semantic segmentation and evaluate a variety of recent deep-learning
techniques to demonstrate the challenges in semantic segmentation in natural
environments. We propose train-val-test splits for standard benchmarks as well
as domain adaptation benchmarks and utilize an automated split generation
technique to ensure the balance of class label distributions. The data,
evaluation scripts and pretrained models will be released upon acceptance at
https://csiro-robotics.github.io/WildScenes.
Related papers
- Syn-to-Real Unsupervised Domain Adaptation for Indoor 3D Object Detection [50.448520056844885]
We propose a novel framework for syn-to-real unsupervised domain adaptation in indoor 3D object detection.
Our adaptation results from synthetic dataset 3D-FRONT to real-world datasets ScanNetV2 and SUN RGB-D demonstrate remarkable mAP25 improvements of 9.7% and 9.1% over Source-Only baselines.
arXiv Detail & Related papers (2024-06-17T08:18:41Z) - ALSTER: A Local Spatio-Temporal Expert for Online 3D Semantic
Reconstruction [62.599588577671796]
We propose an online 3D semantic segmentation method that incrementally reconstructs a 3D semantic map from a stream of RGB-D frames.
Unlike offline methods, ours is directly applicable to scenarios with real-time constraints, such as robotics or mixed reality.
arXiv Detail & Related papers (2023-11-29T20:30:18Z) - DatasetNeRF: Efficient 3D-aware Data Factory with Generative Radiance
Fields [73.97131748433212]
This paper introduces a novel approach capable of generating infinite, high-quality 3D-consistent 2D annotations alongside 3D point cloud segmentations.
We leverage the strong semantic prior within a 3D generative model to train a semantic decoder.
Once trained, the decoder efficiently generalizes across the latent space, enabling the generation of infinite data.
arXiv Detail & Related papers (2023-11-18T21:58:28Z) - Navya3DSeg -- Navya 3D Semantic Segmentation Dataset & split generation
for autonomous vehicles [63.20765930558542]
3D semantic data are useful for core perception tasks such as obstacle detection and ego-vehicle localization.
We propose a new dataset, Navya 3D (Navya3DSeg), with a diverse label space corresponding to a large scale production grade operational domain.
It contains 23 labeled sequences and 25 supplementary sequences without labels, designed to explore self-supervised and semi-supervised semantic segmentation benchmarks on point clouds.
arXiv Detail & Related papers (2023-02-16T13:41:19Z) - 3D-PL: Domain Adaptive Depth Estimation with 3D-aware Pseudo-Labeling [37.315964084413174]
We develop a domain adaptation framework via generating reliable pseudo ground truths of depth from real data to provide direct supervisions.
Specifically, we propose two mechanisms for pseudo-labeling: 1) 2D-based pseudo-labels via measuring the consistency of depth predictions when images are with the same content but different styles; 2) 3D-aware pseudo-labels via a point cloud completion network that learns to complete the depth values in the 3D space.
arXiv Detail & Related papers (2022-09-19T17:54:17Z) - Collaborative Propagation on Multiple Instance Graphs for 3D Instance
Segmentation with Single-point Supervision [63.429704654271475]
We propose a novel weakly supervised method RWSeg that only requires labeling one object with one point.
With these sparse weak labels, we introduce a unified framework with two branches to propagate semantic and instance information.
Specifically, we propose a Cross-graph Competing Random Walks (CRW) algorithm that encourages competition among different instance graphs.
arXiv Detail & Related papers (2022-08-10T02:14:39Z) - Ego2HandsPose: A Dataset for Egocentric Two-hand 3D Global Pose
Estimation [0.0]
Ego2HandsPose is the first dataset that enables color-based two-hand 3D tracking in unseen domains.
We develop a set of parametric fitting algorithms to enable 1) 3D hand pose annotation using a single image, 2) automatic conversion from 2D to 3D hand poses and 3) accurate two-hand tracking with temporal consistency.
arXiv Detail & Related papers (2022-06-10T07:50:45Z) - Improving Semi-Supervised and Domain-Adaptive Semantic Segmentation with
Self-Supervised Depth Estimation [94.16816278191477]
We present a framework for semi-adaptive and domain-supervised semantic segmentation.
It is enhanced by self-supervised monocular depth estimation trained only on unlabeled image sequences.
We validate the proposed model on the Cityscapes dataset.
arXiv Detail & Related papers (2021-08-28T01:33:38Z) - H3D: Benchmark on Semantic Segmentation of High-Resolution 3D Point
Clouds and textured Meshes from UAV LiDAR and Multi-View-Stereo [4.263987603222371]
This paper introduces a 3D dataset which is unique in three ways.
It depicts the village of Hessigheim (Germany) henceforth referred to as H3D.
It is designed for promoting research in the field of 3D data analysis on one hand and to evaluate and rank emerging approaches.
arXiv Detail & Related papers (2021-02-10T09:33:48Z) - Weakly Supervised Semantic Segmentation in 3D Graph-Structured Point
Clouds of Wild Scenes [36.07733308424772]
The deficiency of 3D segmentation labels is one of the main obstacles to effective point cloud segmentation.
We propose a novel deep graph convolutional network-based framework for large-scale semantic scene segmentation in point clouds with sole 2D supervision.
arXiv Detail & Related papers (2020-04-26T23:02:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.