Importance-Aware Semantic Segmentation in Self-Driving with Discrete
Wasserstein Training
- URL: http://arxiv.org/abs/2010.12440v1
- Date: Wed, 21 Oct 2020 20:43:47 GMT
- Title: Importance-Aware Semantic Segmentation in Self-Driving with Discrete
Wasserstein Training
- Authors: Xiaofeng Liu, Yuzhuo Han, Song Bai, Yi Ge, Tianxing Wang, Xu Han, Site
Li, Jane You, Ju Lu
- Abstract summary: We propose to incorporate the importance-aware inter-class correlation in a Wasserstein training framework.
We evaluate our method on CamVid and Cityscapes datasets with different backbones.
- Score: 44.78636414245145
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Semantic segmentation (SS) is an important perception manner for self-driving
cars and robotics, which classifies each pixel into a pre-determined class. The
widely-used cross entropy (CE) loss-based deep networks has achieved
significant progress w.r.t. the mean Intersection-over Union (mIoU). However,
the cross entropy loss can not take the different importance of each class in
an self-driving system into account. For example, pedestrians in the image
should be much more important than the surrounding buildings when make a
decisions in the driving, so their segmentation results are expected to be as
accurate as possible. In this paper, we propose to incorporate the
importance-aware inter-class correlation in a Wasserstein training framework by
configuring its ground distance matrix. The ground distance matrix can be
pre-defined following a priori in a specific task, and the previous
importance-ignored methods can be the particular cases. From an optimization
perspective, we also extend our ground metric to a linear, convex or concave
increasing function $w.r.t.$ pre-defined ground distance. We evaluate our
method on CamVid and Cityscapes datasets with different backbones (SegNet,
ENet, FCN and Deeplab) in a plug and play fashion. In our extenssive
experiments, Wasserstein loss demonstrates superior segmentation performance on
the predefined critical classes for safe-driving.
Related papers
- Point Cloud Based Scene Segmentation: A Survey [3.0846824529023387]
We provide an overview of the current state-of-the-art methods in the field of Point Cloud Semantics for autonomous driving.
We categorize the approaches into projection-based, 3D-based and hybrid methods.
We also emphasize the importance of synthetic data to support research when real-world data is limited.
arXiv Detail & Related papers (2025-03-16T18:02:41Z) - Boosting Generalizability towards Zero-Shot Cross-Dataset Single-Image Indoor Depth by Meta-Initialization [17.822554284161868]
We use gradient-based meta-learning to gain higher generalizability on zero-shot cross-dataset inference.
We propose zero-shot cross-dataset protocols and validate higher generalizability induced by our meta-initialization.
arXiv Detail & Related papers (2024-09-04T07:25:50Z) - CARD: Semantic Segmentation with Efficient Class-Aware Regularized
Decoder [31.223271128719603]
We propose a universal Class-Aware Regularization (CAR) approach to optimize the intra-class variance and inter-class distance during feature learning.
CAR can be directly applied to most existing segmentation models during training, and can largely improve their accuracy at no additional inference overhead.
arXiv Detail & Related papers (2023-01-11T01:41:37Z) - ALSO: Automotive Lidar Self-supervision by Occupancy estimation [70.70557577874155]
We propose a new self-supervised method for pre-training the backbone of deep perception models operating on point clouds.
The core idea is to train the model on a pretext task which is the reconstruction of the surface on which the 3D points are sampled.
The intuition is that if the network is able to reconstruct the scene surface, given only sparse input points, then it probably also captures some fragments of semantic information.
arXiv Detail & Related papers (2022-12-12T13:10:19Z) - Standardized Max Logits: A Simple yet Effective Approach for Identifying
Unexpected Road Obstacles in Urban-Scene Segmentation [18.666365568765098]
We propose a simple yet effective approach that standardizes the max logits in order to align the different distributions and reflect the relative meanings of max logits within each predicted class.
Our method achieves a new state-of-the-art performance on the publicly available Fishyscapes Lost & Found leaderboard with a large margin.
arXiv Detail & Related papers (2021-07-23T14:25:02Z) - Video Class Agnostic Segmentation Benchmark for Autonomous Driving [13.312978643938202]
In certain safety-critical robotics applications, it is important to segment all objects, including those unknown at training time.
We formalize the task of video class segmentation from monocular video sequences in autonomous driving to account for unknown objects.
arXiv Detail & Related papers (2021-03-19T20:41:40Z) - Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data
Augmentation [77.60050239225086]
We propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images.
Our approach is fully automatic without any human interaction.
We present a multi-task network for VUS parsing and a multi-stream network for VHI parsing.
arXiv Detail & Related papers (2020-12-15T03:03:38Z) - Reinforced Wasserstein Training for Severity-Aware Semantic Segmentation
in Autonomous Driving [45.11602128316305]
We develop a training framework to explore the inter-class correlation by defining its ground metric as misclassification severity.
Experiments on both CamVid and Cityscapes datasets evidenced the effectiveness of our Wasserstein loss.
arXiv Detail & Related papers (2020-08-11T15:00:41Z) - Spatiotemporal Relationship Reasoning for Pedestrian Intent Prediction [57.56466850377598]
Reasoning over visual data is a desirable capability for robotics and vision-based applications.
In this paper, we present a framework on graph to uncover relationships in different objects in the scene for reasoning about pedestrian intent.
Pedestrian intent, defined as the future action of crossing or not-crossing the street, is a very crucial piece of information for autonomous vehicles.
arXiv Detail & Related papers (2020-02-20T18:50:44Z) - Key Points Estimation and Point Instance Segmentation Approach for Lane
Detection [65.37887088194022]
We propose a traffic line detection method called Point Instance Network (PINet)
The PINet includes several stacked hourglass networks that are trained simultaneously.
The PINet achieves competitive accuracy and false positive on the TuSimple and Culane datasets.
arXiv Detail & Related papers (2020-02-16T15:51:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.