Investigating Vision-Language Model for Point Cloud-based Vehicle Classification
- URL: http://arxiv.org/abs/2504.08154v1
- Date: Thu, 10 Apr 2025 22:37:27 GMT
- Title: Investigating Vision-Language Model for Point Cloud-based Vehicle Classification
- Authors: Yiqiao Li, Jie Wei, Camille Kamga,
- Abstract summary: Heavy-duty trucks pose significant safety challenges due to their large size and limited maneuverability.<n>Traditional LiDAR-based truck classification methods rely on extensive manual annotations.<n>This study introduces a novel framework that integrates roadside LiDAR point cloud data with vision-language models.
- Score: 3.9148444463558465
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Heavy-duty trucks pose significant safety challenges due to their large size and limited maneuverability compared to passenger vehicles. A deeper understanding of truck characteristics is essential for enhancing the safety perspective of cooperative autonomous driving. Traditional LiDAR-based truck classification methods rely on extensive manual annotations, which makes them labor-intensive and costly. The rapid advancement of large language models (LLMs) trained on massive datasets presents an opportunity to leverage their few-shot learning capabilities for truck classification. However, existing vision-language models (VLMs) are primarily trained on image datasets, which makes it challenging to directly process point cloud data. This study introduces a novel framework that integrates roadside LiDAR point cloud data with VLMs to facilitate efficient and accurate truck classification, which supports cooperative and safe driving environments. This study introduces three key innovations: (1) leveraging real-world LiDAR datasets for model development, (2) designing a preprocessing pipeline to adapt point cloud data for VLM input, including point cloud registration for dense 3D rendering and mathematical morphological techniques to enhance feature representation, and (3) utilizing in-context learning with few-shot prompting to enable vehicle classification with minimally labeled training data. Experimental results demonstrate encouraging performance of this method and present its potential to reduce annotation efforts while improving classification accuracy.
Related papers
- Bridging the Modality Gap in Roadside LiDAR: A Training-Free Vision-Language Model Framework for Vehicle Classification [5.746505534720594]
Fine-grained truck classification is critical for intelligent transportation systems (ITS)<n>Current LiDAR-based methods face scalability challenges due to their reliance on supervised deep learning and labor-intensive manual annotation.<n>We propose a framework that bridges this gap by adapting off-the-shelf Vision-Language Models for fine-grained truck classification without parameter fine-tuning.
arXiv Detail & Related papers (2026-02-10T05:39:48Z) - Overtake Detection in Trucks Using CAN Bus Signals: A Comparative Study of Machine Learning Methods [51.28632782308621]
We focus on overtake detection using Controller Area Network (CAN) bus data collected from five in-service trucks provided by the Volvo Group.<n>We evaluate three common classifiers for vehicle manoeuvre detection, Artificial Neural Networks (ANN), Random Forest (RF), and Support Vector Machines (SVM)<n>Our pertruck analysis also reveals that classification accuracy, especially for overtakes, depends on the amount of training data per vehicle.
arXiv Detail & Related papers (2025-07-01T09:20:41Z) - RS2V-L: Vehicle-Mounted LiDAR Data Generation from Roadside Sensor Observations [4.219537240663029]
RS2V-L is a novel framework for reconstructing and synthesizing vehicle-mounted LiDAR data from roadside sensor observations.<n>To the best of our knowledge, this is the first approach to reconstruct vehicle-mounted LiDAR data from roadside sensor inputs.
arXiv Detail & Related papers (2025-03-10T09:08:05Z) - Learning to Drive by Imitating Surrounding Vehicles [0.6612847014373572]
Imitation learning is a promising approach for training autonomous vehicles to navigate complex traffic environments.<n>We propose a data augmentation strategy that enhances imitation learning by leveraging the observed trajectories of nearby vehicles.<n>We evaluate our approach using the state-of-the-art learning-based planning method PLUTO on the nuPlan dataset and demonstrate that our augmentation method leads to improved performance in complex driving scenarios.
arXiv Detail & Related papers (2025-03-08T00:40:47Z) - 4D Contrastive Superflows are Dense 3D Representation Learners [62.433137130087445]
We introduce SuperFlow, a novel framework designed to harness consecutive LiDAR-camera pairs for establishing pretraining objectives.
To further boost learning efficiency, we incorporate a plug-and-play view consistency module that enhances alignment of the knowledge distilled from camera views.
arXiv Detail & Related papers (2024-07-08T17:59:54Z) - Multi-Modal Data-Efficient 3D Scene Understanding for Autonomous Driving [58.16024314532443]
We introduce LaserMix++, a framework that integrates laser beam manipulations from disparate LiDAR scans and incorporates LiDAR-camera correspondences to assist data-efficient learning.
Results demonstrate that LaserMix++ outperforms fully supervised alternatives, achieving comparable accuracy with five times fewer annotations.
This substantial advancement underscores the potential of semi-supervised approaches in reducing the reliance on extensive labeled data in LiDAR-based 3D scene understanding systems.
arXiv Detail & Related papers (2024-05-08T17:59:53Z) - Efficient Prompt Tuning of Large Vision-Language Model for Fine-Grained Ship Classification [59.99976102069976]
Fine-grained ship classification in remote sensing (RS-FGSC) poses a significant challenge due to the high similarity between classes and the limited availability of labeled data.<n>Recent advancements in large pre-trained Vision-Language Models (VLMs) have demonstrated impressive capabilities in few-shot or zero-shot learning.<n>This study delves into harnessing the potential of VLMs to enhance classification accuracy for unseen ship categories.
arXiv Detail & Related papers (2024-03-13T05:48:58Z) - Structural Information Guided Multimodal Pre-training for
Vehicle-centric Perception [36.92036421490819]
We propose a novel vehicle-centric pre-training framework called VehicleMAE.
We explicitly extract the sketch lines of vehicles as a form of the spatial structure to guide vehicle reconstruction.
A large-scale dataset is built to pre-train our model, termed Autobot1M, which contains about 1M vehicle images and 12693 text information.
arXiv Detail & Related papers (2023-12-15T14:10:21Z) - Advancements in 3D Lane Detection Using LiDAR Point Clouds: From Data Collection to Model Development [10.78971892551972]
LiSV-3DLane is a large-scale 3D lane dataset that comprises 20k frames of surround-view LiDAR point clouds with enriched semantic annotation.
We propose a novel LiDAR-based 3D lane detection model, LiLaDet, incorporating the spatial geometry learning of the LiDAR point cloud into Bird's Eye View (BEV) based lane identification.
arXiv Detail & Related papers (2023-09-24T09:58:49Z) - SPOT: Scalable 3D Pre-training via Occupancy Prediction for Learning Transferable 3D Representations [76.45009891152178]
Pretraining-finetuning approach can alleviate the labeling burden by fine-tuning a pre-trained backbone across various downstream datasets as well as tasks.
We show, for the first time, that general representations learning can be achieved through the task of occupancy prediction.
Our findings will facilitate the understanding of LiDAR points and pave the way for future advancements in LiDAR pre-training.
arXiv Detail & Related papers (2023-09-19T11:13:01Z) - Open-Set Semi-Supervised Learning for 3D Point Cloud Understanding [62.17020485045456]
It is commonly assumed in semi-supervised learning (SSL) that the unlabeled data are drawn from the same distribution as that of the labeled ones.
We propose to selectively utilize unlabeled data through sample weighting, so that only conducive unlabeled data would be prioritized.
arXiv Detail & Related papers (2022-05-02T16:09:17Z) - Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data
Augmentation [77.60050239225086]
We propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images.
Our approach is fully automatic without any human interaction.
We present a multi-task network for VUS parsing and a multi-stream network for VHI parsing.
arXiv Detail & Related papers (2020-12-15T03:03:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.