Toward Robust and Harmonious Adaptation for Cross-modal Retrieval
- URL: http://arxiv.org/abs/2511.14416v1
- Date: Tue, 18 Nov 2025 12:21:23 GMT
- Title: Toward Robust and Harmonious Adaptation for Cross-modal Retrieval
- Authors: Haobin Li, Mouxing Yang, Xi Peng,
- Abstract summary: We propose a novel method for achieving online and harmonious adaptation against query shift (QS)<n>In this paper, we observe that QS would not only undermine the well-structured common space inherited from the source model, but also steer the model toward forgetting the indispensable general knowledge for Cross-Modal Retrieval (CMR)
- Score: 22.206923502018952
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, the general-to-customized paradigm has emerged as the dominant approach for Cross-Modal Retrieval (CMR), which reconciles the distribution shift problem between the source domain and the target domain. However, existing general-to-customized CMR methods typically assume that the entire target-domain data is available, which is easily violated in real-world scenarios and thus inevitably suffer from the query shift (QS) problem. Specifically, query shift embraces the following two characteristics and thus poses new challenges to CMR. i) Online Shift: real-world queries always arrive in an online manner, rendering it impractical to access the entire query set beforehand for customization approaches; ii) Diverse Shift: even with domain customization, the CMR models struggle to satisfy queries from diverse users or scenarios, leaving an urgent need to accommodate diverse queries. In this paper, we observe that QS would not only undermine the well-structured common space inherited from the source model, but also steer the model toward forgetting the indispensable general knowledge for CMR. Inspired by the observations, we propose a novel method for achieving online and harmonious adaptation against QS, dubbed Robust adaptation with quEry ShifT (REST). To deal with online shift, REST first refines the retrieval results to formulate the query predictions and accordingly designs a QS-robust objective function on these predictions to preserve the well-established common space in an online manner. As for tackling the more challenging diverse shift, REST employs a gradient decoupling module to dexterously manipulate the gradients during the adaptation process, thus preventing the CMR model from forgetting the general knowledge. Extensive experiments on 20 benchmarks across three CMR tasks verify the effectiveness of our method against QS.
Related papers
- Knowing When to Answer: Adaptive Confidence Refinement for Reliable Audio-Visual Question Answering [15.39457034915546]
We present a formal problem formulation for textitReliable Audio-Visual Question Answering ($mathcalR$-AVQA), where we prefer abstention over answering incorrectly.<n>We propose Adaptive Confidence Refinement (ACR), a lightweight method to further enhance the performance of $mathcalR$-AVQA.
arXiv Detail & Related papers (2026-02-04T08:35:33Z) - Unifying Search and Recommendation in LLMs via Gradient Multi-Subspace Tuning [33.69176756907003]
Gradient Multi-Subspace Tuning (GEMS) is a novel framework that unifies search and recommendation tasks.<n>We show that GEMS consistently outperforms the state-of-the-art baselines across both search and recommendation tasks.
arXiv Detail & Related papers (2026-01-14T14:03:07Z) - Grounded Test-Time Adaptation for LLM Agents [75.62784644919803]
Large language model (LLM)-based agents struggle to generalize to novel and complex environments.<n>We propose two strategies for adapting LLM agents by leveraging environment-specific information available during deployment.
arXiv Detail & Related papers (2025-11-06T22:24:35Z) - Test Time Adaptation Using Adaptive Quantile Recalibration [19.97106215064574]
Domain adaptation is a key strategy for enhancing the generalizability of deep learning models in real-world scenarios.<n>Recent test-time adaptation methods based on batch normalization statistic updates allow for unsupervised adaptation.<n>We propose Adaptive Quantile Recalibration (AQR), a test-time adaptation technique that modifies pre-activation distributions by aligning quantiles on a channel-wise basis.
arXiv Detail & Related papers (2025-11-05T03:12:30Z) - RAAD-LLM: Adaptive Anomaly Detection Using LLMs and RAG Integration [2.879328762187361]
We present RAAD-LLM, a novel framework for adaptive anomaly detection.<n>By effectively utilizing domain-specific knowledge, RAAD-LLM enhances the detection of anomalies in time series data.<n>Results show significant improvements over our previous model with an accuracy increase from 70.7% to 88.6% on the real-world dataset.
arXiv Detail & Related papers (2025-03-04T17:20:43Z) - Towards Generalizable Trajectory Prediction Using Dual-Level Representation Learning And Adaptive Prompting [107.4034346788744]
Existing vehicle trajectory prediction models struggle with generalizability, prediction uncertainties, and handling complex interactions.<n>We propose Perceiver with Register queries (PerReg+), a novel trajectory prediction framework that introduces: (1) Dual-Level Representation Learning via Self-Distillation (SD) and Masked Reconstruction (MR), capturing global context and fine-grained details; (2) Enhanced Multimodality using register-based queries and pretraining, eliminating the need for clustering and suppression; and (3) Adaptive Prompt Tuning during fine-tuning, freezing the main architecture and optimizing a small number of prompts for efficient adaptation.
arXiv Detail & Related papers (2025-01-08T20:11:09Z) - Test-time Adaptation for Cross-modal Retrieval with Query Shift [14.219337695007207]
We propose a novel method dubbed Test-time adaptation for Cross-modal Retrieval (TCR)
In this paper, we observe that query shift would not only diminish the uniformity (namely, within-modality scatter) of the query modality but also amplify the gap between query and gallery modalities.
arXiv Detail & Related papers (2024-10-21T04:08:19Z) - Dual Adversarial Alignment for Realistic Support-Query Shift Few-shot
Learning [15.828113109152069]
Support-Query Shift Few-shot learning aims to classify unseen examples (query set) to labeled data (support set) based on the learned embedding in a low-dimensional space.
In this paper, we propose a novel but more difficult challenge, Realistic Support-Query Shift few-shot learning.
In addition, we propose a unified adversarial feature alignment method called DUal adversarial ALignment framework (DuaL) to relieve RSQS from two aspects, i.e., inter-domain bias and intra-domain variance.
arXiv Detail & Related papers (2023-09-05T09:50:31Z) - Self-regulating Prompts: Foundational Model Adaptation without
Forgetting [112.66832145320434]
We introduce a self-regularization framework for prompting called PromptSRC.
PromptSRC guides the prompts to optimize for both task-specific and task-agnostic general representations.
arXiv Detail & Related papers (2023-07-13T17:59:35Z) - Toward Certified Robustness Against Real-World Distribution Shifts [65.66374339500025]
We train a generative model to learn perturbations from data and define specifications with respect to the output of the learned model.
A unique challenge arising from this setting is that existing verifiers cannot tightly approximate sigmoid activations.
We propose a general meta-algorithm for handling sigmoid activations which leverages classical notions of counter-example-guided abstraction refinement.
arXiv Detail & Related papers (2022-06-08T04:09:13Z) - On Continual Model Refinement in Out-of-Distribution Data Streams [64.62569873799096]
Real-world natural language processing (NLP) models need to be continually updated to fix the prediction errors in out-of-distribution (OOD) data streams.
Existing continual learning (CL) problem setups cannot cover such a realistic and complex scenario.
We propose a new CL problem formulation dubbed continual model refinement (CMR)
arXiv Detail & Related papers (2022-05-04T11:54:44Z) - Semi-Supervised Learning with Variational Bayesian Inference and Maximum
Uncertainty Regularization [62.21716612888669]
We propose two generic methods for improving semi-supervised learning (SSL)
The first integrates weight perturbation (WP) into existing "consistency regularization" (CR) based methods.
The second method proposes a novel consistency loss called "maximum uncertainty regularization" (MUR)
arXiv Detail & Related papers (2020-12-03T09:49:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.