Related papers: PEFT-DML: Parameter-Efficient Fine-Tuning Deep Metric Learning for Robust Multi-Modal 3D Object Detection in Autonomous Driving

PEFT-DML: Parameter-Efficient Fine-Tuning Deep Metric Learning for Robust Multi-Modal 3D Object Detection in Autonomous Driving

URL: http://arxiv.org/abs/2512.00060v1
Date: Sun, 23 Nov 2025 03:07:14 GMT
Title: PEFT-DML: Parameter-Efficient Fine-Tuning Deep Metric Learning for Robust Multi-Modal 3D Object Detection in Autonomous Driving
Authors: Abdolazim Rezaei, Mehdi Sookhak,
Abstract summary: PEFT-DML is a parameter-efficient deep metric learning framework for robust 3D object detection in autonomous driving.<n>By integrating Low-Rank Adaptation (LoRA) and adapter layers, PEFT-DML achieves significant training efficiency.<n>Experiments on benchmarks nuScenes demonstrate superior accuracy.
Score: 0.979731979071071
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This study introduces PEFT-DML, a parameter-efficient deep metric learning framework for robust multi-modal 3D object detection in autonomous driving. Unlike conventional models that assume fixed sensor availability, PEFT-DML maps diverse modalities (LiDAR, radar, camera, IMU, GNSS) into a shared latent space, enabling reliable detection even under sensor dropout or unseen modality class combinations. By integrating Low-Rank Adaptation (LoRA) and adapter layers, PEFT-DML achieves significant training efficiency while enhancing robustness to fast motion, weather variability, and domain shifts. Experiments on benchmarks nuScenes demonstrate superior accuracy.

Related papers

DIMM: Decoupled Multi-hierarchy Kalman Filter for 3D Object Tracking [50.038098341549095]
State estimation is challenging for 3D object tracking with high maneuverability.<n>We propose a novel framework, DIMM, to effectively combine estimates from different motion models in each direction.<n>DIMM significantly improves the tracking accuracy of existing state estimation methods by 31.61%99.23%.
arXiv Detail & Related papers (2025-05-18T10:12:41Z)
OptiPMB: Enhancing 3D Multi-Object Tracking with Optimized Poisson Multi-Bernoulli Filtering [16.047505930360202]
We present OptiPMB, a novel RFS-based 3D MOT method that employs an optimized Poisson multi-Bernoulli filter.<n>We show that OptiPMB achieves superior tracking accuracy compared with state-of-the-art methods.
arXiv Detail & Related papers (2025-03-17T09:24:26Z)
Efficient Multimodal 3D Object Detector via Instance-Level Contrastive Distillation [17.634678949648208]
We introduce a fast yet effective multimodal 3D object detector, incorporating our proposed Instance-level Contrastive Distillation (ICD) framework and Cross Linear Attention Fusion Module (CLFM)<n>Our 3D object detector outperforms state-of-the-art (SOTA) methods while achieving superior efficiency.
arXiv Detail & Related papers (2025-03-17T08:26:11Z)
Easy-Poly: A Easy Polyhedral Framework For 3D Multi-Object Tracking [23.40561503456164]
We present Easy-Poly, a real-time, filter-based 3D MOT framework for multiple object categories.<n>Results show that Easy-Poly outperforms state-of-the-art methods such as Poly-MOT and Fast-Poly.<n>These findings highlight Easy-Poly's adaptability and robustness in diverse scenarios.
arXiv Detail & Related papers (2025-02-25T04:01:25Z)
Robust Multimodal 3D Object Detection via Modality-Agnostic Decoding and Proximity-based Modality Ensemble [15.173314907900842]
Existing 3D object detection methods rely heavily on the LiDAR sensor. We propose MEFormer to address the LiDAR over-reliance problem. Our MEFormer achieves state-of-the-art performance of 73.9% NDS and 71.5% mAP in the nuScenes validation set.
arXiv Detail & Related papers (2024-07-27T03:21:44Z)
Multi-Modal Data-Efficient 3D Scene Understanding for Autonomous Driving [58.16024314532443]
We introduce LaserMix++, a framework that integrates laser beam manipulations from disparate LiDAR scans and incorporates LiDAR-camera correspondences to assist data-efficient learning.<n>Results demonstrate that LaserMix++ outperforms fully supervised alternatives, achieving comparable accuracy with five times fewer annotations.<n>This substantial advancement underscores the potential of semi-supervised approaches in reducing the reliance on extensive labeled data in LiDAR-based 3D scene understanding systems.
arXiv Detail & Related papers (2024-05-08T17:59:53Z)
Cross-Cluster Shifting for Efficient and Effective 3D Object Detection in Autonomous Driving [69.20604395205248]
We present a new 3D point-based detector model, named Shift-SSD, for precise 3D object detection in autonomous driving. We introduce an intriguing Cross-Cluster Shifting operation to unleash the representation capacity of the point-based detector. We conduct extensive experiments on the KITTI, runtime, and nuScenes datasets, and the results demonstrate the state-of-the-art performance of Shift-SSD.
arXiv Detail & Related papers (2024-03-10T10:36:32Z)
ShaSTA-Fuse: Camera-LiDAR Sensor Fusion to Model Shape and Spatio-Temporal Affinities for 3D Multi-Object Tracking [26.976216624424385]
3D multi-object tracking (MOT) is essential for an autonomous mobile agent to safely navigate a scene. We aim to develop a 3D MOT framework that fuses camera and LiDAR sensor information.
arXiv Detail & Related papers (2023-10-04T02:17:59Z)
UniTR: A Unified and Efficient Multi-Modal Transformer for Bird's-Eye-View Representation [113.35352122662752]
We present an efficient multi-modal backbone for outdoor 3D perception named UniTR. UniTR processes a variety of modalities with unified modeling and shared parameters. UniTR is also a fundamentally task-agnostic backbone that naturally supports different 3D perception tasks.
arXiv Detail & Related papers (2023-08-15T12:13:44Z)
SimDistill: Simulated Multi-modal Distillation for BEV 3D Object Detection [56.24700754048067]
Multi-view camera-based 3D object detection has become popular due to its low cost, but accurately inferring 3D geometry solely from camera data remains challenging. We propose a Simulated multi-modal Distillation (SimDistill) method by carefully crafting the model architecture and distillation strategy. Our SimDistill can learn better feature representations for 3D object detection while maintaining a cost-effective camera-only deployment.
arXiv Detail & Related papers (2023-03-29T16:08:59Z)
The Devil is in the Task: Exploiting Reciprocal Appearance-Localization Features for Monocular 3D Object Detection [62.1185839286255]
Low-cost monocular 3D object detection plays a fundamental role in autonomous driving. We introduce a Dynamic Feature Reflecting Network, named DFR-Net. We rank 1st among all the monocular 3D object detectors in the KITTI test set.
arXiv Detail & Related papers (2021-12-28T07:31:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.