Related papers: Lightweight Road Environment Segmentation using Vector Quantization

Lightweight Road Environment Segmentation using Vector Quantization

URL: http://arxiv.org/abs/2504.14113v1
Date: Sat, 19 Apr 2025 00:13:21 GMT
Title: Lightweight Road Environment Segmentation using Vector Quantization
Authors: Jiyong Kwag, Alper Yilmaz, Charles Toth,
Abstract summary: Road environment segmentation plays a significant role in autonomous driving.<n>We propose segmentation of the autonomous driving environment using vector quantization.<n>We achieve 77.0 % mIoU on Cityscapes, outperforming the baseline by 2.9 % without increasing the model's initial size or complexity.
Score: 0.13654846342364302
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Road environment segmentation plays a significant role in autonomous driving. Numerous works based on Fully Convolutional Networks (FCNs) and Transformer architectures have been proposed to leverage local and global contextual learning for efficient and accurate semantic segmentation. In both architectures, the encoder often relies heavily on extracting continuous representations from the image, which limits the ability to represent meaningful discrete information. To address this limitation, we propose segmentation of the autonomous driving environment using vector quantization. Vector quantization offers three primary advantages for road environment segmentation. (1) Each continuous feature from the encoder is mapped to a discrete vector from the codebook, helping the model discover distinct features more easily than with complex continuous features. (2) Since a discrete feature acts as compressed versions of the encoder's continuous features, they also compress noise or outliers, enhancing the image segmentation task. (3) Vector quantization encourages the latent space to form coarse clusters of continuous features, forcing the model to group similar features, making the learned representations more structured for the decoding process. In this work, we combined vector quantization with the lightweight image segmentation model MobileUNETR and used it as a baseline model for comparison to demonstrate its efficiency. Through experiments, we achieved 77.0 % mIoU on Cityscapes, outperforming the baseline by 2.9 % without increasing the model's initial size or complexity.

Related papers

ContextFormer: Redefining Efficiency in Semantic Segmentation [48.81126061219231]
Convolutional methods, although capturing local dependencies well, struggle with long-range relationships. Vision Transformers (ViTs) excel in global context capture but are hindered by high computational demands. We propose ContextFormer, a hybrid framework leveraging the strengths of CNNs and ViTs in the bottleneck to balance efficiency, accuracy, and robustness for real-time semantic segmentation.
arXiv Detail & Related papers (2025-01-31T16:11:04Z)
Multi-scale Feature Enhancement in Multi-task Learning for Medical Image Analysis [1.6916040234975798]
Traditional deep learning methods in medical imaging often focus solely on segmentation or classification.<n>We propose a simple yet effective UNet-based MTL model, where features extracted by the encoder are used to predict classification labels, while the decoder produces the segmentation mask.<n> Experimental results across multiple medical datasets confirm the superior performance of our model in both segmentation and classification tasks.
arXiv Detail & Related papers (2024-11-30T04:20:05Z)
Perceiving Longer Sequences With Bi-Directional Cross-Attention Transformers [13.480259378415505]
BiXT scales linearly with input size in terms of computational cost and memory consumption. BiXT is inspired by the Perceiver architectures but replaces iterative attention with an efficient bi-directional cross-attention module. By combining efficiency with the generality and performance of a full Transformer architecture, BiXT can process longer sequences.
arXiv Detail & Related papers (2024-02-19T13:38:15Z)
LENet: Lightweight And Efficient LiDAR Semantic Segmentation Using Multi-Scale Convolution Attention [0.0]
We propose a projection-based semantic segmentation network called LENet with an encoder-decoder structure for LiDAR-based semantic segmentation. The encoder is composed of a novel multi-scale convolutional attention (MSCA) module with varying receptive field sizes to capture features. We show that our proposed method is lighter, more efficient, and robust compared to state-of-the-art semantic segmentation methods.
arXiv Detail & Related papers (2023-01-11T02:51:38Z)
UNETR++: Delving into Efficient and Accurate 3D Medical Image Segmentation [93.88170217725805]
We propose a 3D medical image segmentation approach, named UNETR++, that offers both high-quality segmentation masks as well as efficiency in terms of parameters, compute cost, and inference speed. The core of our design is the introduction of a novel efficient paired attention (EPA) block that efficiently learns spatial and channel-wise discriminative features. Our evaluations on five benchmarks, Synapse, BTCV, ACDC, BRaTs, and Decathlon-Lung, reveal the effectiveness of our contributions in terms of both efficiency and accuracy.
arXiv Detail & Related papers (2022-12-08T18:59:57Z)
ClusTR: Exploring Efficient Self-attention via Clustering for Vision Transformers [70.76313507550684]
We propose a content-based sparse attention method, as an alternative to dense self-attention. Specifically, we cluster and then aggregate key and value tokens, as a content-based method of reducing the total token count. The resulting clustered-token sequence retains the semantic diversity of the original signal, but can be processed at a lower computational cost.
arXiv Detail & Related papers (2022-08-28T04:18:27Z)
CloudAttention: Efficient Multi-Scale Attention Scheme For 3D Point Cloud Learning [81.85951026033787]
We set transformers in this work and incorporate them into a hierarchical framework for shape classification and part and scene segmentation. We also compute efficient and dynamic global cross attentions by leveraging sampling and grouping at each iteration. The proposed hierarchical model achieves state-of-the-art shape classification in mean accuracy and yields results on par with the previous segmentation methods.
arXiv Detail & Related papers (2022-07-31T21:39:15Z)
Learning Local Displacements for Point Cloud Completion [93.54286830844134]
We propose a novel approach aimed at object and semantic scene completion from a partial scan represented as a 3D point cloud. Our architecture relies on three novel layers that are used successively within an encoder-decoder structure. We evaluate both architectures on object and indoor scene completion tasks, achieving state-of-the-art performance.
arXiv Detail & Related papers (2022-03-30T18:31:37Z)
Multi-scale and Cross-scale Contrastive Learning for Semantic Segmentation [5.281694565226513]
We apply contrastive learning to enhance the discriminative power of the multi-scale features extracted by semantic segmentation networks. By first mapping the encoder's multi-scale representations to a common feature space, we instantiate a novel form of supervised local-global constraint.
arXiv Detail & Related papers (2022-03-25T01:24:24Z)
Efficient Transformer based Method for Remote Sensing Image Change Detection [17.553240434628087]
High-resolution remote sensing CD remains challenging due to the complexity of objects in the scene. We propose a bitemporal image transformer (BiT) to efficiently and effectively model contexts within the spatial-temporal domain. BiT-based model significantly outperforms the purely convolutional baseline using only 3 times lower computational costs and model parameters.
arXiv Detail & Related papers (2021-02-27T13:08:46Z)
LiDAR-based Panoptic Segmentation via Dynamic Shifting Network [56.71765153629892]
LiDAR-based panoptic segmentation aims to parse both objects and scenes in a unified manner. We propose the Dynamic Shifting Network (DS-Net), which serves as an effective panoptic segmentation framework in the point cloud realm. Our proposed DS-Net achieves superior accuracies over current state-of-the-art methods.
arXiv Detail & Related papers (2020-11-24T08:44:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.