Related papers: Learning Invariant Causal Mechanism from Vision-Language Models

Learning Invariant Causal Mechanism from Vision-Language Models

URL: http://arxiv.org/abs/2405.15289v4
Date: Tue, 17 Jun 2025 11:43:46 GMT
Title: Learning Invariant Causal Mechanism from Vision-Language Models
Authors: Zeen Song, Siyu Zhao, Xingyu Zhang, Jiangmeng Li, Changwen Zheng, Wenwen Qiang,
Abstract summary: We show that the causal mechanism involving both invariant and variant factors in training environments differs from that in test environments.<n>We propose the Invariant Causal Mechanism of CLIP (CLIP-ICM) framework.<n>Our method offers a simple but powerful enhancement, boosting the reliability of CLIP in real-world applications.
Score: 14.0158707862717
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Contrastive Language-Image Pretraining (CLIP) has achieved remarkable success, but its performance can degrade when fine-tuned in out-of-distribution (OOD) scenarios. We model the prediction process using a Structural Causal Model (SCM) and show that the causal mechanism involving both invariant and variant factors in training environments differs from that in test environments. In contrast, the causal mechanism with solely invariant factors remains consistent across environments. We theoretically prove the existence of a linear mapping from CLIP embeddings to invariant factors, which can be estimated using interventional data. Additionally, we provide a condition to guarantee low OOD risk of the invariant predictor. Based on these insights, we propose the Invariant Causal Mechanism of CLIP (CLIP-ICM) framework. CLIP-ICM involves collecting interventional data, estimating a linear projection matrix, and making predictions within the invariant subspace. Experiments on several OOD datasets show that CLIP-ICM significantly improves the performance of CLIP. Our method offers a simple but powerful enhancement, boosting the reliability of CLIP in real-world applications.

Related papers

Causal Disentanglement and Cross-Modal Alignment for Enhanced Few-Shot Learning [11.752632557524969]
Causal CLIP Adapter (CCA) is a novel framework that explicitly disentangles visual features extracted from CLIP.<n>Our method consistently outperforms state-of-the-art approaches in terms of few-shot performance and robustness to distributional shifts.
arXiv Detail & Related papers (2025-08-05T05:30:42Z)
ICLShield: Exploring and Mitigating In-Context Learning Backdoor Attacks [61.06621533874629]
In-context learning (ICL) has demonstrated remarkable success in large language models (LLMs)<n>In this paper, we propose, for the first time, the dual-learning hypothesis, which posits that LLMs simultaneously learn both the task-relevant latent concepts and backdoor latent concepts.<n>Motivated by these findings, we propose ICLShield, a defense mechanism that dynamically adjusts the concept preference ratio.
arXiv Detail & Related papers (2025-07-02T03:09:20Z)
From predictions to confidence intervals: an empirical study of conformal prediction methods for in-context learning [4.758643223243787]
We propose a method based on conformal prediction to construct prediction intervals with guaranteed coverage. While traditional conformal methods are computationally expensive due to repeated model fitting, we exploit ICL to efficiently generate confidence intervals in a single forward pass. Our empirical analysis compares this approach against ridge regression-based conformal methods, showing that conformal prediction with in-context learning (CP with ICL) achieves robust and scalable uncertainty estimates.
arXiv Detail & Related papers (2025-04-22T09:11:48Z)
Model Hemorrhage and the Robustness Limits of Large Language Models [119.46442117681147]
Large language models (LLMs) demonstrate strong performance across natural language processing tasks, yet undergo significant performance degradation when modified for deployment. We define this phenomenon as model hemorrhage - performance decline caused by parameter alterations and architectural changes.
arXiv Detail & Related papers (2025-03-31T10:16:03Z)
Self-Healing Machine Learning: A Framework for Autonomous Adaptation in Real-World Environments [50.310636905746975]
Real-world machine learning systems often encounter model performance degradation due to distributional shifts in the underlying data generating process. Existing approaches to addressing shifts, such as concept drift adaptation, are limited by their reason-agnostic nature. We propose self-healing machine learning (SHML) to overcome these limitations.
arXiv Detail & Related papers (2024-10-31T20:05:51Z)
Cycles of Thought: Measuring LLM Confidence through Stable Explanations [53.15438489398938]
Large language models (LLMs) can reach and even surpass human-level accuracy on a variety of benchmarks, but their overconfidence in incorrect responses is still a well-documented failure mode. We propose a framework for measuring an LLM's uncertainty with respect to the distribution of generated explanations for an answer.
arXiv Detail & Related papers (2024-06-05T16:35:30Z)
Contrastive Learning Via Equivariant Representation [19.112460889771423]
We propose CLeVER, a novel equivariant contrastive learning framework compatible with augmentation strategies of arbitrary complexity. Experimental results demonstrate that CLeVER effectively extracts and incorporates equivariant information from practical natural images.
arXiv Detail & Related papers (2024-06-01T01:53:51Z)
Bayesian Exploration of Pre-trained Models for Low-shot Image Classification [14.211305168954594]
This work proposes a simple and effective probabilistic model ensemble framework based on Gaussian processes. We achieve the integration of prior knowledge by specifying the mean function with CLIP and the kernel function. We demonstrate that our method consistently outperforms competitive ensemble baselines regarding predictive performance.
arXiv Detail & Related papers (2024-03-30T10:25:28Z)
Diagnosing and Rectifying Fake OOD Invariance: A Restructured Causal Approach [51.012396632595554]
Invariant representation learning (IRL) encourages the prediction from invariant causal features to labels de-confounded from the environments. Recent theoretical results verified that some causal features recovered by IRLs merely pretend domain-invariantly in the training environments but fail in unseen domains. We develop an approach based on conditional mutual information with respect to RS-SCM, then rigorously rectify the spurious and fake invariant effects.
arXiv Detail & Related papers (2023-12-15T12:58:05Z)
Variance of ML-based software fault predictors: are we really improving fault prediction? [0.3222802562733786]
We experimentally analyze the variance of a state-of-the-art fault prediction approach. We observed a maximum variance of 10.10% in terms of the per-class accuracy metric.
arXiv Detail & Related papers (2023-10-26T09:31:32Z)
CLIPood: Generalizing CLIP to Out-of-Distributions [73.86353105017076]
Contrastive language-image pre-training (CLIP) models have shown impressive zero-shot ability, but the further adaptation of CLIP on downstream tasks undesirably degrades OOD performances. We propose CLIPood, a fine-tuning method that can adapt CLIP models to OOD situations where both domain shifts and open classes may occur on unseen test data. Experiments on diverse datasets with different OOD scenarios show that CLIPood consistently outperforms existing generalization techniques.
arXiv Detail & Related papers (2023-02-02T04:27:54Z)
Learning Counterfactually Invariant Predictors [11.682403472580162]
We propose a model-agnostic framework, called Counterfactually Invariant Prediction (CIP) Our experimental results demonstrate the effectiveness of CIP in enforcing counterfactual invariance across various simulated and real-world datasets.
arXiv Detail & Related papers (2022-07-20T09:23:35Z)
Out-of-distribution Generalization with Causal Invariant Transformations [17.18953986654873]
In this work, we tackle the OOD problem without explicitly recovering the causal feature. Under the setting of invariant causal mechanism, we theoretically show that if all such transformations are available, then we can learn a minimax optimal model. Noticing that knowing a complete set of these causal invariant transformations may be impractical, we further show that it suffices to know only a subset of these transformations.
arXiv Detail & Related papers (2022-03-22T08:04:38Z)
Variance Minimization in the Wasserstein Space for Invariant Causal Prediction [72.13445677280792]
In this work, we show that the approach taken in ICP may be reformulated as a series of nonparametric tests that scales linearly in the number of predictors. Each of these tests relies on the minimization of a novel loss function that is derived from tools in optimal transport theory. We prove under mild assumptions that our method is able to recover the set of identifiable direct causes, and we demonstrate in our experiments that it is competitive with other benchmark causal discovery algorithms.
arXiv Detail & Related papers (2021-10-13T22:30:47Z)
Discovering Latent Causal Variables via Mechanism Sparsity: A New Principle for Nonlinear ICA [81.4991350761909]
Independent component analysis (ICA) refers to an ensemble of methods which formalize this goal and provide estimation procedure for practical application. We show that the latent variables can be recovered up to a permutation if one regularizes the latent mechanisms to be sparse.
arXiv Detail & Related papers (2021-07-21T14:22:14Z)
Nonlinear Invariant Risk Minimization: A Causal Approach [5.63479133344366]
We propose a learning paradigm that enables out-of-distribution generalization in the nonlinear setting. We show identifiability of the data representation up to very simple transformations. Extensive experiments on both synthetic and real-world datasets show that our approach significantly outperforms a variety of baseline methods.
arXiv Detail & Related papers (2021-02-24T15:38:41Z)
Learning Causal Semantic Representation for Out-of-Distribution Prediction [125.38836464226092]
We propose a Causal Semantic Generative model (CSG) based on a causal reasoning so that the two factors are modeled separately. We show that CSG can identify the semantic factor by fitting training data, and this semantic-identification guarantees the boundedness of OOD generalization error.
arXiv Detail & Related papers (2020-11-03T13:16:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.