On the Out-Of-Distribution Generalization of Multimodal Large Language
Models
- URL: http://arxiv.org/abs/2402.06599v1
- Date: Fri, 9 Feb 2024 18:21:51 GMT
- Title: On the Out-Of-Distribution Generalization of Multimodal Large Language
Models
- Authors: Xingxuan Zhang, Jiansheng Li, Wenjing Chu, Junjia Hai, Renzhe Xu,
Yuqing Yang, Shikai Guan, Jiazheng Xu, and Peng Cui
- Abstract summary: We investigate the generalization boundaries of current Multimodal Large Language Models (MLLMs)
We evaluate their zero-shot generalization across synthetic images, real-world distributional shifts, and specialized datasets like medical and molecular imagery.
We show that in-context learning can significantly enhance MLLMs' generalization, opening new avenues for overcoming generalization barriers.
- Score: 24.431960338495184
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We investigate the generalization boundaries of current Multimodal Large
Language Models (MLLMs) via comprehensive evaluation under out-of-distribution
scenarios and domain-specific tasks. We evaluate their zero-shot generalization
across synthetic images, real-world distributional shifts, and specialized
datasets like medical and molecular imagery. Empirical results indicate that
MLLMs struggle with generalization beyond common training domains, limiting
their direct application without adaptation. To understand the cause of
unreliable performance, we analyze three hypotheses: semantic
misinterpretation, visual feature extraction insufficiency, and mapping
deficiency. Results identify mapping deficiency as the primary hurdle. To
address this problem, we show that in-context learning (ICL) can significantly
enhance MLLMs' generalization, opening new avenues for overcoming
generalization barriers. We further explore the robustness of ICL under
distribution shifts and show its vulnerability to domain shifts, label shifts,
and spurious correlation shifts between in-context examples and test data.
Related papers
- On the Universal Truthfulness Hyperplane Inside LLMs [27.007142483859162]
We investigate whether a universal truthfulness hyperplane that distinguishes the model's factually correct and incorrect outputs exists within the model.
Our results indicate that increasing the diversity of the training datasets significantly enhances the performance in all scenarios.
arXiv Detail & Related papers (2024-07-11T15:07:26Z) - Learning Divergence Fields for Shift-Robust Graph Representations [73.11818515795761]
In this work, we propose a geometric diffusion model with learnable divergence fields for the challenging problem with interdependent data.
We derive a new learning objective through causal inference, which can guide the model to learn generalizable patterns of interdependence that are insensitive across domains.
arXiv Detail & Related papers (2024-06-07T14:29:21Z) - Unveiling the Generalization Power of Fine-Tuned Large Language Models [81.70754292058258]
We investigate whether fine-tuning affects the intrinsic generalization ability intrinsic to Large Language Models (LLMs)
Our main findings reveal that models fine-tuned on generation and classification tasks exhibit dissimilar behaviors in generalizing to different domains and tasks.
We observe that integrating the in-context learning strategy during fine-tuning on generation tasks can enhance the model's generalization ability.
arXiv Detail & Related papers (2024-03-14T08:18:59Z) - Dive into the Chasm: Probing the Gap between In- and Cross-Topic
Generalization [66.4659448305396]
This study analyzes various LMs with three probing-based experiments to shed light on the reasons behind the In- vs. Cross-Topic generalization gap.
We demonstrate, for the first time, that generalization gaps and the robustness of the embedding space vary significantly across LMs.
arXiv Detail & Related papers (2024-02-02T12:59:27Z) - Sparsity-Guided Holistic Explanation for LLMs with Interpretable
Inference-Time Intervention [53.896974148579346]
Large Language Models (LLMs) have achieved unprecedented breakthroughs in various natural language processing domains.
The enigmatic black-box'' nature of LLMs remains a significant challenge for interpretability, hampering transparent and accountable applications.
We propose a novel methodology anchored in sparsity-guided techniques, aiming to provide a holistic interpretation of LLMs.
arXiv Detail & Related papers (2023-12-22T19:55:58Z) - LLMs Understand Glass-Box Models, Discover Surprises, and Suggest
Repairs [10.222281712562705]
We show that large language models (LLMs) are remarkably good at working with interpretable models.
By adopting a hierarchical approach to reasoning, LLMs can provide comprehensive model-level summaries.
We present the package $textttTalkToEBM$ as an open-source LLM-GAM interface.
arXiv Detail & Related papers (2023-08-02T13:59:35Z) - Counterfactual Maximum Likelihood Estimation for Training Deep Networks [83.44219640437657]
Deep learning models are prone to learning spurious correlations that should not be learned as predictive clues.
We propose a causality-based training framework to reduce the spurious correlations caused by observable confounders.
We conduct experiments on two real-world tasks: Natural Language Inference (NLI) and Image Captioning.
arXiv Detail & Related papers (2021-06-07T17:47:16Z) - Invariant Causal Prediction for Block MDPs [106.63346115341862]
Generalization across environments is critical to the successful application of reinforcement learning algorithms to real-world challenges.
We propose a method of invariant prediction to learn model-irrelevance state abstractions (MISA) that generalize to novel observations in the multi-environment setting.
arXiv Detail & Related papers (2020-03-12T21:03:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.