Related papers: Quantifying Structure in CLIP Embeddings: A Statistical Framework for Concept Interpretation

Quantifying Structure in CLIP Embeddings: A Statistical Framework for Concept Interpretation

URL: http://arxiv.org/abs/2506.13831v1
Date: Mon, 16 Jun 2025 02:43:11 GMT
Title: Quantifying Structure in CLIP Embeddings: A Statistical Framework for Concept Interpretation
Authors: Jitian Zhao, Chenghui Li, Frederic Sala, Karl Rohe,
Abstract summary: Concept-based approaches aim to identify human-understandable concepts within a model's internal representations.<n>Current methods lack statistical rigor, making it challenging to validate identified concepts and compare different techniques.<n>We propose a hypothesis testing framework that quantifies rotation-sensitive structures within the CLIP embedding space.<n>Unlike existing approaches, it offers theoretical guarantees that discovered concepts represent robust, reproducible patterns.
Score: 13.206499575700219
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Concept-based approaches, which aim to identify human-understandable concepts within a model's internal representations, are a promising method for interpreting embeddings from deep neural network models, such as CLIP. While these approaches help explain model behavior, current methods lack statistical rigor, making it challenging to validate identified concepts and compare different techniques. To address this challenge, we introduce a hypothesis testing framework that quantifies rotation-sensitive structures within the CLIP embedding space. Once such structures are identified, we propose a post-hoc concept decomposition method. Unlike existing approaches, it offers theoretical guarantees that discovered concepts represent robust, reproducible patterns (rather than method-specific artifacts) and outperforms other techniques in terms of reconstruction error. Empirically, we demonstrate that our concept-based decomposition algorithm effectively balances reconstruction accuracy with concept interpretability and helps mitigate spurious cues in data. Applied to a popular spurious correlation dataset, our method yields a 22.6% increase in worst-group accuracy after removing spurious background concepts.

Related papers

Discrete Markov Bridge [93.64996843697278]
We propose a novel framework specifically designed for discrete representation learning, called Discrete Markov Bridge.<n>Our approach is built upon two key components: Matrix Learning and Score Learning.
arXiv Detail & Related papers (2025-05-26T09:32:12Z)
Concept Layers: Enhancing Interpretability and Intervenability via LLM Conceptualization [2.163881720692685]
We introduce a new methodology for incorporating interpretability and intervenability into an existing model by integrating Concept Layers into its architecture.<n>Our approach projects the model's internal vector representations into a conceptual, explainable vector space before reconstructing and feeding them back into the model.<n>We evaluate CLs across multiple tasks, demonstrating that they maintain the original model's performance and agreement while enabling meaningful interventions.
arXiv Detail & Related papers (2025-02-19T11:10:19Z)
Enhancing Performance of Explainable AI Models with Constrained Concept Refinement [10.241134756773228]
Trade-off between accuracy and interpretability has long been a challenge in machine learning (ML)<n>In this paper, we investigate the impact of deviations in concept representations and propose a novel framework to mitigate these effects.<n>Compared to existing explainable methods, our approach not only improves prediction accuracy while preserving model interpretability across various large-scale benchmarks but also achieves this with significantly lower computational cost.
arXiv Detail & Related papers (2025-02-10T18:53:15Z)
Sparks of Explainability: Recent Advancements in Explaining Large Vision Models [6.1642231492615345]
This thesis explores advanced approaches to improve explainability in computer vision by analyzing and modeling the features exploited by deep neural networks.<n>It evaluates attribution methods, notably saliency maps, by introducing a metric based on algorithmic stability and an approach utilizing Sobol indices.<n>Two hypotheses are examined: aligning models with human reasoning and adopting a conceptual explainability approach.
arXiv Detail & Related papers (2025-02-03T04:49:32Z)
Fast and Reliable Probabilistic Reflectometry Inversion with Prior-Amortized Neural Posterior Estimation [73.81105275628751]
Finding all structures compatible with reflectometry data is computationally prohibitive for standard algorithms. We address this lack of reliability with a probabilistic deep learning method that identifies all realistic structures in seconds. Our method, Prior-Amortized Neural Posterior Estimation (PANPE), combines simulation-based inference with novel adaptive priors.
arXiv Detail & Related papers (2024-07-26T10:29:16Z)
Rigorous Probabilistic Guarantees for Robust Counterfactual Explanations [80.86128012438834]
We show for the first time that computing the robustness of counterfactuals with respect to plausible model shifts is NP-complete. We propose a novel probabilistic approach which is able to provide tight estimates of robustness with strong guarantees.
arXiv Detail & Related papers (2024-07-10T09:13:11Z)
ECATS: Explainable-by-design concept-based anomaly detection for time series [0.5956301166481089]
We propose ECATS, a concept-based neuro-symbolic architecture where concepts are represented as Signal Temporal Logic (STL) formulae. We show that our model is able to achieve great classification performance while ensuring local interpretability.
arXiv Detail & Related papers (2024-05-17T08:12:53Z)
Implicit Concept Removal of Diffusion Models [92.55152501707995]
Text-to-image (T2I) diffusion models often inadvertently generate unwanted concepts such as watermarks and unsafe images. We present the Geom-Erasing, a novel concept removal method based on the geometric-driven control.
arXiv Detail & Related papers (2023-10-09T17:13:10Z)
Uncovering Unique Concept Vectors through Latent Space Decomposition [0.0]
Concept-based explanations have emerged as a superior approach that is more interpretable than feature attribution estimates. We propose a novel post-hoc unsupervised method that automatically uncovers the concepts learned by deep models during training. Our experiments reveal that the majority of our concepts are readily understandable to humans, exhibit coherency, and bear relevance to the task at hand.
arXiv Detail & Related papers (2023-07-13T17:21:54Z)
Deblurring via Stochastic Refinement [85.42730934561101]
We present an alternative framework for blind deblurring based on conditional diffusion models. Our method is competitive in terms of distortion metrics such as PSNR.
arXiv Detail & Related papers (2021-12-05T04:36:09Z)
Trust but Verify: Assigning Prediction Credibility by Counterfactual Constrained Learning [123.3472310767721]
Prediction credibility measures are fundamental in statistics and machine learning. These measures should account for the wide variety of models used in practice. The framework developed in this work expresses the credibility as a risk-fit trade-off.
arXiv Detail & Related papers (2020-11-24T19:52:38Z)
Control as Hybrid Inference [62.997667081978825]
We present an implementation of CHI which naturally mediates the balance between iterative and amortised inference. We verify the scalability of our algorithm on a continuous control benchmark, demonstrating that it outperforms strong model-free and model-based baselines.
arXiv Detail & Related papers (2020-07-11T19:44:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.