Related papers: Hierarchical Insights: Exploiting Structural Similarities for Reliable 3D Semantic Segmentation

Hierarchical Insights: Exploiting Structural Similarities for Reliable 3D Semantic Segmentation

URL: http://arxiv.org/abs/2404.06124v1
Date: Tue, 9 Apr 2024 08:49:01 GMT
Title: Hierarchical Insights: Exploiting Structural Similarities for Reliable 3D Semantic Segmentation
Authors: Mariella Dreissig, Florian Piewak, Joschka Boedecker,
Abstract summary: We propose a training strategy which enables a 3D LiDAR semantic segmentation model to learn structural relationships between the different classes through abstraction. We show, how this training strategy not only improves the model's confidence calibration, but also preserves additional information for downstream tasks like fusion, prediction and planning.
Score: 4.894417113725933
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Safety-critical applications like autonomous driving call for robust 3D environment perception algorithms which can withstand highly diverse and ambiguous surroundings. The predictive performance of any classification model strongly depends on the underlying dataset and the prior knowledge conveyed by the annotated labels. While the labels provide a basis for the learning process, they usually fail to represent inherent relations between the classes - representations, which are a natural element of the human perception system. We propose a training strategy which enables a 3D LiDAR semantic segmentation model to learn structural relationships between the different classes through abstraction. We achieve this by implicitly modeling those relationships through a learning rule for hierarchical multi-label classification (HMC). With a detailed analysis we show, how this training strategy not only improves the model's confidence calibration, but also preserves additional information for downstream tasks like fusion, prediction and planning.

Related papers

Neural Network Reprogrammability: A Unified Theme on Model Reprogramming, Prompt Tuning, and Prompt Instruction [55.914891182214475]
We introduce neural network reprogrammability as a unifying framework for model adaptation.<n>We present a taxonomy that categorizes such information manipulation approaches across four key dimensions.<n>We also analyze remaining technical challenges and ethical considerations.
arXiv Detail & Related papers (2025-06-05T05:42:27Z)
3DLabelProp: Geometric-Driven Domain Generalization for LiDAR Semantic Segmentation in Autonomous Driving [7.35996217853436]
Domain generalization aims to find ways for deep learning models to maintain their performance despite domain shifts between training and inference datasets. This is particularly important for models that need to be robust or are costly to train. This work proposes a geometry-based approach, leveraging the sequential structure of LiDAR sensors, which sets it apart from the learning-based methods commonly found in the literature.
arXiv Detail & Related papers (2025-01-24T16:22:35Z)
Enhancing Interpretability Through Loss-Defined Classification Objective in Structured Latent Spaces [5.2542280870644715]
We introduce Latent Boost, a novel approach that integrates advanced distance metric learning into supervised classification tasks. Latent Boost improves classification interpretability, as demonstrated by higher Silhouette scores, while accelerating training convergence.
arXiv Detail & Related papers (2024-12-11T16:25:17Z)
Self-supervised Learning of Dense Hierarchical Representations for Medical Image Segmentation [2.2265038612930663]
This paper demonstrates a self-supervised framework for learning voxel-wise coarse-to-fine representations tailored for dense downstream tasks. We devise a training strategy that balances the contributions of features from multiple scales, ensuring that the learned representations capture both coarse and fine-grained details.
arXiv Detail & Related papers (2024-01-12T09:47:17Z)
Generalized Robot 3D Vision-Language Model with Fast Rendering and Pre-Training Vision-Language Alignment [55.11291053011696]
This work presents a framework for dealing with 3D scene understanding when the labeled scenes are quite limited. To extract knowledge for novel categories from the pre-trained vision-language models, we propose a hierarchical feature-aligned pre-training and knowledge distillation strategy. In the limited reconstruction case, our proposed approach, termed WS3D++, ranks 1st on the large-scale ScanNet benchmark.
arXiv Detail & Related papers (2023-12-01T15:47:04Z)
Class-level Structural Relation Modelling and Smoothing for Visual Representation Learning [12.247343963572732]
This paper presents a framework termed bfClass-level Structural Relation Modeling and Smoothing for Visual Representation Learning (CSRMS) It includes the Class-level Relation Modelling, Class-aware GraphGuided Sampling, and Graph-Guided Representation Learning modules. Experiments demonstrate the effectiveness of structured knowledge modelling for enhanced representation learning and show that CSRMS can be incorporated with any state-of-the-art visual representation learning models for performance gains.
arXiv Detail & Related papers (2023-08-08T09:03:46Z)
Unsupervised 3D registration through optimization-guided cyclical self-training [71.75057371518093]
State-of-the-art deep learning-based registration methods employ three different learning strategies. We propose a novel self-supervised learning paradigm for unsupervised registration, relying on self-training. We evaluate the method for abdomen and lung registration, consistently surpassing metric-based supervision and outperforming diverse state-of-the-art competitors.
arXiv Detail & Related papers (2023-06-29T14:54:10Z)
ALP: Action-Aware Embodied Learning for Perception [60.64801970249279]
We introduce Action-Aware Embodied Learning for Perception (ALP) ALP incorporates action information into representation learning through a combination of optimizing a reinforcement learning policy and an inverse dynamics prediction objective. We show that ALP outperforms existing baselines in several downstream perception tasks.
arXiv Detail & Related papers (2023-06-16T21:51:04Z)
LESS: Label-Efficient Semantic Segmentation for LiDAR Point Clouds [62.49198183539889]
We propose a label-efficient semantic segmentation pipeline for outdoor scenes with LiDAR point clouds. Our method co-designs an efficient labeling process with semi/weakly supervised learning. Our proposed method is even highly competitive compared to the fully supervised counterpart with 100% labels.
arXiv Detail & Related papers (2022-10-14T19:13:36Z)
Self-Taught Metric Learning without Labels [47.832107446521626]
We present a novel self-taught framework for unsupervised metric learning. It alternates between predicting class-equivalence relations between data through a moving average of an embedding model and learning the model with the predicted relations as pseudo labels.
arXiv Detail & Related papers (2022-05-04T05:48:40Z)
Online Deep Learning based on Auto-Encoder [4.128388784932455]
We propose a two-phase Online Deep Learning based on Auto-Encoder (ODLAE) Based on auto-encoder, considering reconstruction loss, we extract abstract hierarchical latent representations of instances. We devise two fusion strategies: the output-level fusion strategy, which is obtained by fusing the classification results of each hidden layer; and feature-level fusion strategy, which is leveraged self-attention mechanism to fusion every hidden layer output.
arXiv Detail & Related papers (2022-01-19T02:14:57Z)
Self-Supervised Class Incremental Learning [51.62542103481908]
Existing Class Incremental Learning (CIL) methods are based on a supervised classification framework sensitive to data labels. When updating them based on the new class data, they suffer from catastrophic forgetting: the model cannot discern old class data clearly from the new. In this paper, we explore the performance of Self-Supervised representation learning in Class Incremental Learning (SSCIL) for the first time.
arXiv Detail & Related papers (2021-11-18T06:58:19Z)
Deep Relational Metric Learning [84.95793654872399]
This paper presents a deep relational metric learning framework for image clustering and retrieval. We learn an ensemble of features that characterizes an image from different aspects to model both interclass and intraclass distributions. Experiments on the widely-used CUB-200-2011, Cars196, and Stanford Online Products datasets demonstrate that our framework improves existing deep metric learning methods and achieves very competitive results.
arXiv Detail & Related papers (2021-08-23T09:31:18Z)
Structured Sparse R-CNN for Direct Scene Graph Generation [16.646937866282922]
This paper presents a simple, sparse, and unified framework for relation detection, termed as Structured Sparse R-CNN. The key to our method is a set of learnable triplet queries and structured triplet detectors which could be optimized jointly from the training set in an end-to-end manner. We perform experiments on two benchmarks: Visual Genome and Open Images, and the results demonstrate that our method achieves the state-of-the-art performance.
arXiv Detail & Related papers (2021-06-21T02:24:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.