Related papers: CodeMerge: Codebook-Guided Model Merging for Robust Test-Time Adaptation in Autonomous Driving

CodeMerge: Codebook-Guided Model Merging for Robust Test-Time Adaptation in Autonomous Driving

URL: http://arxiv.org/abs/2505.16524v1
Date: Thu, 22 May 2025 11:09:15 GMT
Title: CodeMerge: Codebook-Guided Model Merging for Robust Test-Time Adaptation in Autonomous Driving
Authors: Huitong Yang, Zhuoxiao Chen, Fengyi Zhang, Zi Huang, Yadan Luo,
Abstract summary: Existing test-time adaptation methods often fail in high-variance tasks like 3D object detection due to unstable optimization and sharp minima.<n>We introduce CodeMerge, a scalable model merging framework that bypasses these limitations by operating in a compact latent space.<n>Our method achieves strong performance across challenging benchmarks, improving end-to-end 3D detection 14.9% NDS on nuScenes-C and LiDAR-based detection by over 7.6% mAP.
Score: 28.022501313260648
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Maintaining robust 3D perception under dynamic and unpredictable test-time conditions remains a critical challenge for autonomous driving systems. Existing test-time adaptation (TTA) methods often fail in high-variance tasks like 3D object detection due to unstable optimization and sharp minima. While recent model merging strategies based on linear mode connectivity (LMC) offer improved stability by interpolating between fine-tuned checkpoints, they are computationally expensive, requiring repeated checkpoint access and multiple forward passes. In this paper, we introduce CodeMerge, a lightweight and scalable model merging framework that bypasses these limitations by operating in a compact latent space. Instead of loading full models, CodeMerge represents each checkpoint with a low-dimensional fingerprint derived from the source model's penultimate features and constructs a key-value codebook. We compute merging coefficients using ridge leverage scores on these fingerprints, enabling efficient model composition without compromising adaptation quality. Our method achieves strong performance across challenging benchmarks, improving end-to-end 3D detection 14.9% NDS on nuScenes-C and LiDAR-based detection by over 7.6% mAP on nuScenes-to-KITTI, while benefiting downstream tasks such as online mapping, motion prediction and planning even without training. Code and pretrained models are released in the supplementary material.

Related papers

SMART-PC: Skeletal Model Adaptation for Robust Test-Time Training in Point Clouds [18.33878596057853]
Test-Time Training (TTT) has emerged as a promising solution to address distribution shifts in 3D point cloud classification.<n>We introduce SMART-PC, a skeleton-based framework that enhances resilience to corruptions by leveraging the geometric structure of 3D point clouds.
arXiv Detail & Related papers (2025-05-26T06:11:02Z)
APCoTTA: Continual Test-Time Adaptation for Semantic Segmentation of Airborne LiDAR Point Clouds [14.348191795901101]
Airborne laser scanning (ALS) point cloud segmentation is a fundamental task for large-scale 3D scene understanding.<n> Continuous Test-Time Adaptation (CTTA) offers a solution by adapting a source-pretrained model to evolving, unlabeled target domains.<n>We propose APCoTTA, the first CTTA method tailored for ALS point cloud semantic segmentation.
arXiv Detail & Related papers (2025-05-15T05:21:16Z)
Easy-Poly: A Easy Polyhedral Framework For 3D Multi-Object Tracking [23.40561503456164]
We present Easy-Poly, a real-time, filter-based 3D MOT framework for multiple object categories.<n>Results show that Easy-Poly outperforms state-of-the-art methods such as Poly-MOT and Fast-Poly.<n>These findings highlight Easy-Poly's adaptability and robustness in diverse scenarios.
arXiv Detail & Related papers (2025-02-25T04:01:25Z)
Cross-Camera Distracted Driver Classification through Feature Disentanglement and Contrastive Learning [13.613407983544427]
Driver Behavior Monitoring Network (DBMNet) relies on a lightweight backbone and integrates a disentanglement module to discard camera view information.<n>DBMNet achieves an improvement of 7% in Top-1 accuracy compared to existing approaches.
arXiv Detail & Related papers (2024-11-20T10:27:12Z)
MOS: Model Synergy for Test-Time Adaptation on LiDAR-Based 3D Object Detection [38.6421466851974]
We propose a novel online test-time adaptation framework for 3D detectors.<n>By leveraging long-term knowledge from previous test batches, our approach mitigates catastrophic forgetting and adapts effectively to diverse shifts.<n>Our method was rigorously tested against existing test-time adaptation strategies across three datasets and eight types of corruptions.
arXiv Detail & Related papers (2024-06-21T05:58:19Z)
Test-Time Model Adaptation with Only Forward Passes [68.11784295706995]
Test-time adaptation has proven effective in adapting a given trained model to unseen test samples with potential distribution shifts. We propose a test-time Forward-Optimization Adaptation (FOA) method. FOA runs on quantized 8-bit ViT, outperforms gradient-based TENT on full-precision 32-bit ViT, and achieves an up to 24-fold memory reduction on ImageNet-C.
arXiv Detail & Related papers (2024-04-02T05:34:33Z)
Unsupervised Domain Adaptation for Self-Driving from Past Traversal Features [69.47588461101925]
We propose a method to adapt 3D object detectors to new driving environments. Our approach enhances LiDAR-based detection models using spatial quantized historical features. Experiments on real-world datasets demonstrate significant improvements.
arXiv Detail & Related papers (2023-09-21T15:00:31Z)
Uncertainty Guided Adaptive Warping for Robust and Efficient Stereo Matching [77.133400999703]
Correlation based stereo matching has achieved outstanding performance. Current methods with a fixed model do not work uniformly well across various datasets. This paper proposes a new perspective to dynamically calculate correlation for robust stereo matching.
arXiv Detail & Related papers (2023-07-26T09:47:37Z)
GLENet: Boosting 3D Object Detectors with Generative Label Uncertainty Estimation [70.75100533512021]
In this paper, we formulate the label uncertainty problem as the diversity of potentially plausible bounding boxes of objects. We propose GLENet, a generative framework adapted from conditional variational autoencoders, to model the one-to-many relationship between a typical 3D object and its potential ground-truth bounding boxes with latent variables. The label uncertainty generated by GLENet is a plug-and-play module and can be conveniently integrated into existing deep 3D detectors.
arXiv Detail & Related papers (2022-07-06T06:26:17Z)
When Liebig's Barrel Meets Facial Landmark Detection: A Practical Model [87.25037167380522]
We propose a model that is accurate, robust, efficient, generalizable, and end-to-end trainable. In order to achieve a better accuracy, we propose two lightweight modules. DQInit dynamically initializes the queries of decoder from the inputs, enabling the model to achieve as good accuracy as the ones with multiple decoder layers. QAMem is designed to enhance the discriminative ability of queries on low-resolution feature maps by assigning separate memory values to each query rather than a shared one.
arXiv Detail & Related papers (2021-05-27T13:51:42Z)
MT3: Meta Test-Time Training for Self-Supervised Test-Time Adaption [69.76837484008033]
An unresolved problem in Deep Learning is the ability of neural networks to cope with domain shifts during test-time. We combine meta-learning, self-supervision and test-time training to learn to adapt to unseen test distributions. Our approach significantly improves the state-of-the-art results on the CIFAR-10-Corrupted image classification benchmark.
arXiv Detail & Related papers (2021-03-30T09:33:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.