Related papers: Exploiting Local Features and Range Images for Small Data Real-Time Point Cloud Semantic Segmentation

Exploiting Local Features and Range Images for Small Data Real-Time Point Cloud Semantic Segmentation

URL: http://arxiv.org/abs/2410.10510v1
Date: Mon, 14 Oct 2024 13:49:05 GMT
Title: Exploiting Local Features and Range Images for Small Data Real-Time Point Cloud Semantic Segmentation
Authors: Daniel Fusaro, Simone Mosco, Emanuele Menegatti, Alberto Pretto,
Abstract summary: In this paper, we harness the information from the three-dimensional representation to proficiently capture local features. A GPU-based KDTree allows for rapid building, querying, and enhancing projection with straightforward operations. We show that a reduced version of our model not only demonstrates strong competitiveness against full-scale state-of-the-art models but also operates in real-time.
Score: 4.02235104503587
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Semantic segmentation of point clouds is an essential task for understanding the environment in autonomous driving and robotics. Recent range-based works achieve real-time efficiency, while point- and voxel-based methods produce better results but are affected by high computational complexity. Moreover, highly complex deep learning models are often not suited to efficiently learn from small datasets. Their generalization capabilities can easily be driven by the abundance of data rather than the architecture design. In this paper, we harness the information from the three-dimensional representation to proficiently capture local features, while introducing the range image representation to incorporate additional information and facilitate fast computation. A GPU-based KDTree allows for rapid building, querying, and enhancing projection with straightforward operations. Extensive experiments on SemanticKITTI and nuScenes datasets demonstrate the benefits of our modification in a ``small data'' setup, in which only one sequence of the dataset is used to train the models, but also in the conventional setup, where all sequences except one are used for training. We show that a reduced version of our model not only demonstrates strong competitiveness against full-scale state-of-the-art models but also operates in real-time, making it a viable choice for real-world case applications. The code of our method is available at https://github.com/Bender97/WaffleAndRange.

Related papers

Scaling Up Diffusion and Flow-based XGBoost Models [5.944645679491607]
We investigate a recent proposal to use XGBoost as the function approximator in diffusion and flow-matching models. With better implementation it can be scaled to datasets 370x larger than previously used. We present results on large-scale scientific datasets as part of the Fast Calorimeter Simulation Challenge.
arXiv Detail & Related papers (2024-08-28T18:00:00Z)
Effective pruning of web-scale datasets based on complexity of concept clusters [48.125618324485195]
We present a method for pruning large-scale multimodal datasets for training CLIP-style models on ImageNet. We find that training on a smaller set of high-quality data can lead to higher performance with significantly lower training costs. We achieve a new state-of-the-art Imagehttps://info.arxiv.org/help/prep#commentsNet zero-shot accuracy and a competitive average zero-shot accuracy on 38 evaluation tasks.
arXiv Detail & Related papers (2024-01-09T14:32:24Z)
A Simple and Efficient Baseline for Data Attribution on Images [107.12337511216228]
Current state-of-the-art approaches require a large ensemble of as many as 300,000 models to accurately attribute model predictions. In this work, we focus on a minimalist baseline, utilizing the feature space of a backbone pretrained via self-supervised learning to perform data attribution. Our method is model-agnostic and scales easily to large datasets.
arXiv Detail & Related papers (2023-11-03T17:29:46Z)
Enhancing Performance of Vision Transformers on Small Datasets through Local Inductive Bias Incorporation [13.056764072568749]
Vision transformers (ViTs) achieve remarkable performance on large datasets, but tend to perform worse than convolutional neural networks (CNNs) on smaller datasets. We propose a module called Local InFormation Enhancer (LIFE) that extracts patch-level local information and incorporates it into the embeddings used in the self-attention block of ViTs. Our proposed module is memory and efficient, as well as flexible enough to process auxiliary tokens such as the classification and distillation tokens.
arXiv Detail & Related papers (2023-05-15T11:23:18Z)
CloudAttention: Efficient Multi-Scale Attention Scheme For 3D Point Cloud Learning [81.85951026033787]
We set transformers in this work and incorporate them into a hierarchical framework for shape classification and part and scene segmentation. We also compute efficient and dynamic global cross attentions by leveraging sampling and grouping at each iteration. The proposed hierarchical model achieves state-of-the-art shape classification in mean accuracy and yields results on par with the previous segmentation methods.
arXiv Detail & Related papers (2022-07-31T21:39:15Z)
CHALLENGER: Training with Attribution Maps [63.736435657236505]
We show that utilizing attribution maps for training neural networks can improve regularization of models and thus increase performance. In particular, we show that our generic domain-independent approach yields state-of-the-art results in vision, natural language processing and on time series tasks.
arXiv Detail & Related papers (2022-05-30T13:34:46Z)
CrossLoc: Scalable Aerial Localization Assisted by Multimodal Synthetic Data [2.554905387213586]
We present a visual localization system that learns to estimate camera poses in the real world with the help of synthetic data. To mitigate the data scarcity issue, we introduce TOPO-DataGen, a versatile synthetic data generation tool. We also introduce CrossLoc, a cross-modal visual representation learning approach to pose estimation.
arXiv Detail & Related papers (2021-12-16T18:05:48Z)
Revisiting Point Cloud Simplification: A Learnable Feature Preserving Approach [57.67932970472768]
Mesh and Point Cloud simplification methods aim to reduce the complexity of 3D models while retaining visual quality and relevant salient features. We propose a fast point cloud simplification method by learning to sample salient points. The proposed method relies on a graph neural network architecture trained to select an arbitrary, user-defined, number of points from the input space and to re-arrange their positions so as to minimize the visual perception error.
arXiv Detail & Related papers (2021-09-30T10:23:55Z)
Transformer-Based Behavioral Representation Learning Enables Transfer Learning for Mobile Sensing in Small Datasets [4.276883061502341]
We provide a neural architecture framework for mobile sensing data that can learn generalizable feature representations from time series. This architecture combines benefits from CNN and Trans-former architectures to enable better prediction performance.
arXiv Detail & Related papers (2021-07-09T22:26:50Z)
Real-Time High-Performance Semantic Image Segmentation of Urban Street Scenes [98.65457534223539]
We propose a real-time high-performance DCNN-based method for robust semantic segmentation of urban street scenes. The proposed method achieves the accuracy of 73.6% and 68.0% mean Intersection over Union (mIoU) with the inference speed of 51.0 fps and 39.3 fps.
arXiv Detail & Related papers (2020-03-11T08:45:53Z)
Fast local linear regression with anchor regularization [21.739281173516247]
We propose a simple yet effective local model training algorithm called the fast anchor regularized local linear method (FALL) Through experiments on synthetic and real-world datasets, we demonstrate that FALL compares favorably in terms of accuracy with the state-of-the-art network Lasso algorithm.
arXiv Detail & Related papers (2020-02-21T10:03:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.