Beyond Weight Adaptation: Feature-Space Domain Injection for Cross-Modal Ship Re-Identification
- URL: http://arxiv.org/abs/2512.20892v1
- Date: Wed, 24 Dec 2025 02:30:23 GMT
- Title: Beyond Weight Adaptation: Feature-Space Domain Injection for Cross-Modal Ship Re-Identification
- Authors: Tingfeng Xian, Wenlve Zhou, Zhiheng Zhou, Zhelin Li,
- Abstract summary: Cross-Modality Ship Re-Identification (CMS Re-ID) is critical for achieving all-day and all-weather maritime target tracking.<n>We explore the potential of Vision Foundation Models (VFMs) in bridging modality gaps.<n>We propose a novel PEFT strategy termed Domain Representation Injection (DRI)
- Score: 3.6907522136316975
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cross-Modality Ship Re-Identification (CMS Re-ID) is critical for achieving all-day and all-weather maritime target tracking, yet it is fundamentally challenged by significant modality discrepancies. Mainstream solutions typically rely on explicit modality alignment strategies; however, this paradigm heavily depends on constructing large-scale paired datasets for pre-training. To address this, grounded in the Platonic Representation Hypothesis, we explore the potential of Vision Foundation Models (VFMs) in bridging modality gaps. Recognizing the suboptimal performance of existing generic Parameter-Efficient Fine-Tuning (PEFT) methods that operate within the weight space, particularly on limited-capacity models, we shift the optimization perspective to the feature space and propose a novel PEFT strategy termed Domain Representation Injection (DRI). Specifically, while keeping the VFM fully frozen to maximize the preservation of general knowledge, we design a lightweight, learnable Offset Encoder to extract domain-specific representations rich in modality and identity attributes from raw inputs. Guided by the contextual information of intermediate features at different layers, a Modulator adaptively transforms these representations. Subsequently, they are injected into the intermediate layers via additive fusion, dynamically reshaping the feature distribution to adapt to the downstream task without altering the VFM's pre-trained weights. Extensive experimental results demonstrate the superiority of our method, achieving State-of-the-Art (SOTA) performance with minimal trainable parameters. For instance, on the HOSS-ReID dataset, we attain 57.9\% and 60.5\% mAP using only 1.54M and 7.05M parameters, respectively. The code is available at https://github.com/TingfengXian/DRI.
Related papers
- Move What Matters: Parameter-Efficient Domain Adaptation via Optimal Transport Flow for Collaborative Perception [8.774658029766988]
FlowAdapt is a parameter-efficient framework grounded in optimal transport theory.<n>We introduce a Wasserstein Greedy Sampling strategy to selectively filter redundant samples.<n> Progressive Knowledge Transfer module is designed to inject compressed early-stage representations into later stages.
arXiv Detail & Related papers (2026-02-12T04:36:50Z) - Rethinking Infrared Small Target Detection: A Foundation-Driven Efficient Paradigm [17.63632082331749]
Large-scale visual foundation models (VFMs) exhibit strong generalization across diverse visual domains, but their potential for single-frame infrared small target (SIRST) detection remains largely unexplored.<n>We propose a Foundation-Driven Efficient Paradigm (FDEP) which can seamlessly adapt to existing encoder-decoder-based methods and significantly improve accuracy without additional inference overhead.
arXiv Detail & Related papers (2025-12-05T08:12:35Z) - CrossEarth-Gate: Fisher-Guided Adaptive Tuning Engine for Efficient Adaptation of Cross-Domain Remote Sensing Semantic Segmentation [32.405967784469304]
CrossEarth-Gate addresses multifaceted domain gaps in Remote Sensing (RS) data.<n>We develop a Fisher-guided adaptive selection mechanism that operates on this toolbox.<n>Our method achieves state-of-the-art performance across 16 cross-domain benchmarks for RS semantic segmentation.
arXiv Detail & Related papers (2025-11-25T13:41:59Z) - FedReFT: Federated Representation Fine-Tuning with All-But-Me Aggregation [12.544628972135905]
We introduce Federated Representation Fine-Tuning (FedReFT), a novel approach to fine-tune the client's hidden representation.<n>FedReFT applies sparse intervention layers to steer hidden representations directly, offering a lightweight and semantically rich fine-tuning alternative.<n>We evaluate FedReFT on commonsense reasoning, arithmetic reasoning, instruction-tuning, and GLUE.
arXiv Detail & Related papers (2025-08-27T22:03:19Z) - AuxDet: Auxiliary Metadata Matters for Omni-Domain Infrared Small Target Detection [49.81255045696323]
We present the Auxiliary Metadata Driven Infrared Small Target Detector (AuxDet)<n>AuxDet integrates metadata semantics with visual features, guiding adaptive representation learning for each sample.<n>Experiments on the challenging WideIRSTD-Full benchmark demonstrate that AuxDet consistently outperforms state-of-the-art methods.
arXiv Detail & Related papers (2025-05-21T07:02:05Z) - VRS-UIE: Value-Driven Reordering Scanning for Underwater Image Enhancement [104.78586859995333]
State Space Models (SSMs) have emerged as a promising backbone for vision tasks due to their linear complexity and global receptive field.<n>The predominance of large-portion, homogeneous but useless oceanic backgrounds can dilute the feature representation responses of sparse yet valuable targets.<n>We propose a novel Value-Driven Reordering Scanning framework for Underwater Image Enhancement (UIE)<n>Our framework sets a new state-of-the-art, delivering superior enhancement performance (surpassing WMamba by 0.89 dB on average) by effectively suppressing water bias and preserving structural and color fidelity.
arXiv Detail & Related papers (2025-05-02T12:21:44Z) - LSP-ST: Ladder Shape-Biased Side-Tuning for Robust Infrared Small Target Detection [4.5138645285711165]
We propose Ladder Shape-Biased Side-Tuning (LSP-ST), a novel approach that introduces a shape-aware inductive bias to facilitate effective adaptation beyond texture cues.<n>With only 4.72M learnable parameters, LSP-ST achieves state-of-the-art performance on multiple infrared small target detection benchmarks.
arXiv Detail & Related papers (2025-04-20T04:12:38Z) - Let Synthetic Data Shine: Domain Reassembly and Soft-Fusion for Single Domain Generalization [68.41367635546183]
Single Domain Generalization aims to train models with consistent performance across diverse scenarios using data from a single source.<n>We propose Discriminative Domain Reassembly and Soft-Fusion (DRSF), a training framework leveraging synthetic data to improve model generalization.
arXiv Detail & Related papers (2025-03-17T18:08:03Z) - ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts [71.91042186338163]
ALoRE is a novel PETL method that reuses the hypercomplex parameterized space constructed by Kronecker product to Aggregate Low Rank Experts.<n>Thanks to the artful design, ALoRE maintains negligible extra parameters and can be effortlessly merged into the frozen backbone.
arXiv Detail & Related papers (2024-12-11T12:31:30Z) - Modality Prompts for Arbitrary Modality Salient Object Detection [57.610000247519196]
This paper delves into the task of arbitrary modality salient object detection (AM SOD)
It aims to detect salient objects from arbitrary modalities, eg RGB images, RGB-D images, and RGB-D-T images.
A novel modality-adaptive Transformer (MAT) will be proposed to investigate two fundamental challenges of AM SOD.
arXiv Detail & Related papers (2024-05-06T11:02:02Z) - GIFD: A Generative Gradient Inversion Method with Feature Domain
Optimization [52.55628139825667]
Federated Learning (FL) has emerged as a promising distributed machine learning framework to preserve clients' privacy.
Recent studies find that an attacker can invert the shared gradients and recover sensitive data against an FL system by leveraging pre-trained generative adversarial networks (GAN) as prior knowledge.
We propose textbfGradient textbfInversion over textbfFeature textbfDomains (GIFD), which disassembles the GAN model and searches the feature domains of the intermediate layers.
arXiv Detail & Related papers (2023-08-09T04:34:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.