Related papers: Adapting Large Multimodal Models to Distribution Shifts: The Role of In-Context Learning

Adapting Large Multimodal Models to Distribution Shifts: The Role of In-Context Learning

URL: http://arxiv.org/abs/2405.12217v1
Date: Mon, 20 May 2024 17:59:21 GMT
Title: Adapting Large Multimodal Models to Distribution Shifts: The Role of In-Context Learning
Authors: Guanglin Zhou, Zhongyi Han, Shiming Chen, Biwei Huang, Liming Zhu, Salman Khan, Xin Gao, Lina Yao,
Abstract summary: Large multimodal models (LMMs) are highly robust against natural distribution shifts. Despite this, domain-specific adaptation is still necessary, particularly in specialized areas like healthcare. This work investigates in-context learning (ICL) as an effective alternative for enhancing LMMs' adaptability.
Score: 41.59855801010565
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Recent studies indicate that large multimodal models (LMMs) are highly robust against natural distribution shifts, often surpassing previous baselines. Despite this, domain-specific adaptation is still necessary, particularly in specialized areas like healthcare. Due to the impracticality of fine-tuning LMMs given their vast parameter space, this work investigates in-context learning (ICL) as an effective alternative for enhancing LMMs' adaptability. We find that the success of ICL heavily relies on the choice of demonstration, mirroring challenges seen in large language models but introducing unique complexities for LMMs facing distribution shifts. Our study addresses this by evaluating an unsupervised ICL method, TopKNearestPR, which selects in-context examples through a nearest example search based on feature similarity. We uncover that its effectiveness is limited by the deficiencies of pre-trained vision encoders under distribution shift scenarios. To address these challenges, we propose InvariantSelectPR, a novel method leveraging Class-conditioned Contrastive Invariance (CCI) for more robust demonstration selection. Specifically, CCI enhances pre-trained vision encoders by improving their discriminative capabilities across different classes and ensuring invariance to domain-specific variations. This enhancement allows the encoders to effectively identify and retrieve the most informative examples, which are then used to guide LMMs in adapting to new query samples under varying distributions. Our experiments show that InvariantSelectPR substantially improves the adaptability of LMMs, achieving significant performance gains on benchmark datasets, with a 34.2%$\uparrow$ accuracy increase in 7-shot on Camelyon17 and 16.9%$\uparrow$ increase in 7-shot on HAM10000 compared to the baseline zero-shot performance.

Related papers

Unified modality separation: A vision-language framework for unsupervised domain adaptation [60.8391821117794]
Unsupervised domain adaptation (UDA) enables models trained on a labeled source domain to handle new unlabeled domains.<n>We propose a unified modality separation framework that accommodates both modality-specific and modality-invariant components.<n>Our methods achieve up to 9% performance gain with 9 times of computational efficiencies.
arXiv Detail & Related papers (2025-08-07T02:51:10Z)
Group Relative Augmentation for Data Efficient Action Detection [11.169883977958454]
Adapting large Video-Language Models (VLMs) for action detection using only a few examples poses challenges.<n>We propose an efficient adaptation strategy combining parameter-efficient tuning (LoRA) with a novel learnable internal feature augmentation.<n>We demonstrate our method's effectiveness on complex multi-label, multi-person action detection datasets.
arXiv Detail & Related papers (2025-07-28T21:46:05Z)
Modality-Balancing Preference Optimization of Large Multimodal Models by Adversarial Negative Mining [66.54211199959298]
We propose a novel preference learning framework, Modality-Balancing Preference Optimization (MBPO), to address the modality imbalance in LMMs.<n>MBPO constructs a more effective offline preference dataset by generating hard negatives, i.e., rejected responses misled by LLM biases.<n>It can enhance LMM performance on challenging vision-language tasks and effectively reduce hallucinations.
arXiv Detail & Related papers (2025-05-20T03:59:05Z)
RAAD-LLM: Adaptive Anomaly Detection Using LLMs and RAG Integration [2.879328762187361]
We present RAAD-LLM, a novel framework for adaptive anomaly detection. By effectively utilizing domain-specific knowledge, RAAD-LLM enhances the detection of anomalies in time series data. Results show significant improvements over our previous model with an accuracy increase from 70.7% to 88.6% on the real-world dataset.
arXiv Detail & Related papers (2025-03-04T17:20:43Z)
Preserving Diversity in Supervised Fine-Tuning of Large Language Models [29.02934952075354]
This paper introduces a new game-theoretic formulation forSupervised Fine-Tuning (SFT) In this framework, an auxiliary variable is introduced to regulate the learning process. We prove that the proposed game-theoretic approach connects to the problem of reverse KL minimization with entropy regularization.
arXiv Detail & Related papers (2024-08-29T16:21:00Z)
Multi-scale Contrastive Adaptor Learning for Segmenting Anything in Underperformed Scenes [12.36950265154199]
We introduce a novel Multi-scale Contrastive Adaptor learning method named MCA-SAM. MCA-SAM enhances adaptor performance through a meticulously designed contrastive learning framework at both token and sample levels. Empirical results demonstrate that MCA-SAM sets new benchmarks, outperforming existing methods in three challenging domains.
arXiv Detail & Related papers (2024-08-12T06:23:10Z)
Contrastive Learning Via Equivariant Representation [19.112460889771423]
We propose CLeVER, a novel equivariant contrastive learning framework compatible with augmentation strategies of arbitrary complexity. Experimental results demonstrate that CLeVER effectively extracts and incorporates equivariant information from practical natural images.
arXiv Detail & Related papers (2024-06-01T01:53:51Z)
Chain-of-Thought Prompting for Demographic Inference with Large Multimodal Models [58.58594658683919]
Large multimodal models (LMMs) have shown transformative potential across various research tasks. Our findings indicate LMMs possess advantages in zero-shot learning, interpretability, and handling uncurated 'in-the-wild' inputs. We propose a Chain-of-Thought augmented prompting approach, which effectively mitigates the off-target prediction issue.
arXiv Detail & Related papers (2024-05-24T16:26:56Z)
Which Examples to Annotate for In-Context Learning? Towards Effective and Efficient Selection [35.924633625147365]
Large Language Models (LLMs) can adapt to new tasks via in-context learning (ICL) In this work, we investigate an active learning approach for ICL, where there is a limited budget for annotating examples. We propose a model-adaptive optimization-free algorithm, termed AdaICL, which identifies examples that the model is uncertain about.
arXiv Detail & Related papers (2023-10-30T22:03:55Z)
Winning Prize Comes from Losing Tickets: Improve Invariant Learning by Exploring Variant Parameters for Out-of-Distribution Generalization [76.27711056914168]
Out-of-Distribution (OOD) Generalization aims to learn robust models that generalize well to various environments without fitting to distribution-specific features. Recent studies based on Lottery Ticket Hypothesis (LTH) address this problem by minimizing the learning target to find some of the parameters that are critical to the task. We propose Exploring Variant parameters for Invariant Learning (EVIL) which also leverages the distribution knowledge to find the parameters that are sensitive to distribution shift.
arXiv Detail & Related papers (2023-10-25T06:10:57Z)
Learning Optimal Features via Partial Invariance [18.552839725370383]
Invariant Risk Minimization (IRM) is a popular framework that aims to learn robust models from multiple environments. We show that IRM can over-constrain the predictor and to remedy this, we propose a relaxation via $textitpartial invariance$. Several experiments, conducted both in linear settings as well as with deep neural networks on tasks over both language and image data, allow us to verify our conclusions.
arXiv Detail & Related papers (2023-01-28T02:48:14Z)
Meta-Causal Feature Learning for Out-of-Distribution Generalization [71.38239243414091]
This paper presents a balanced meta-causal learner (BMCL), which includes a balanced task generation module (BTG) and a meta-causal feature learning module (MCFL) BMCL effectively identifies the class-invariant visual regions for classification and may serve as a general framework to improve the performance of the state-of-the-art methods.
arXiv Detail & Related papers (2022-08-22T09:07:02Z)
Exploring Complementary Strengths of Invariant and Equivariant Representations for Few-Shot Learning [96.75889543560497]
In many real-world problems, collecting a large number of labeled samples is infeasible. Few-shot learning is the dominant approach to address this issue, where the objective is to quickly adapt to novel categories in presence of a limited number of samples. We propose a novel training mechanism that simultaneously enforces equivariance and invariance to a general set of geometric transformations.
arXiv Detail & Related papers (2021-03-01T21:14:33Z)
Cauchy-Schwarz Regularized Autoencoder [68.80569889599434]
Variational autoencoders (VAE) are a powerful and widely-used class of generative models. We introduce a new constrained objective based on the Cauchy-Schwarz divergence, which can be computed analytically for GMMs. Our objective improves upon variational auto-encoding models in density estimation, unsupervised clustering, semi-supervised learning, and face analysis.
arXiv Detail & Related papers (2021-01-06T17:36:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.