Explaining Domain Shifts in Language: Concept erasing for Interpretable Image Classification
- URL: http://arxiv.org/abs/2503.18483v1
- Date: Mon, 24 Mar 2025 09:35:28 GMT
- Title: Explaining Domain Shifts in Language: Concept erasing for Interpretable Image Classification
- Authors: Zequn Zeng, Yudi Su, Jianqiao Sun, Tiansheng Wen, Hao Zhang, Zhengjue Wang, Bo Chen, Hongwei Liu, Jiawei Ma,
- Abstract summary: Concept-based models can map black-box representations to human-understandable concepts.<n>However, domain-specific concepts often impact the final predictions.<n>We propose a novel Language-guided Concept-Erasing framework.
- Score: 20.513553071049557
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Concept-based models can map black-box representations to human-understandable concepts, which makes the decision-making process more transparent and then allows users to understand the reason behind predictions. However, domain-specific concepts often impact the final predictions, which subsequently undermine the model generalization capabilities, and prevent the model from being used in high-stake applications. In this paper, we propose a novel Language-guided Concept-Erasing (LanCE) framework. In particular, we empirically demonstrate that pre-trained vision-language models (VLMs) can approximate distinct visual domain shifts via domain descriptors while prompting large Language Models (LLMs) can easily simulate a wide range of descriptors of unseen visual domains. Then, we introduce a novel plug-in domain descriptor orthogonality (DDO) regularizer to mitigate the impact of these domain-specific concepts on the final predictions. Notably, the DDO regularizer is agnostic to the design of concept-based models and we integrate it into several prevailing models. Through evaluation of domain generalization on four standard benchmarks and three newly introduced benchmarks, we demonstrate that DDO can significantly improve the out-of-distribution (OOD) generalization over the previous state-of-the-art concept-based models.Our code is available at https://github.com/joeyz0z/LanCE.
Related papers
- Concept Layers: Enhancing Interpretability and Intervenability via LLM Conceptualization [2.163881720692685]
We introduce a new methodology for incorporating interpretability and intervenability into an existing model by integrating Concept Layers into its architecture.
Our approach projects the model's internal vector representations into a conceptual, explainable vector space before reconstructing and feeding them back into the model.
We evaluate CLs across multiple tasks, demonstrating that they maintain the original model's performance and agreement while enabling meaningful interventions.
arXiv Detail & Related papers (2025-02-19T11:10:19Z) - Analyzing Fine-tuning Representation Shift for Multimodal LLMs Steering alignment [53.90425382758605]
We show how fine-tuning alters the internal structure of a model to specialize in new multimodal tasks.<n>Our work sheds light on how multimodal representations evolve through fine-tuning and offers a new perspective for interpreting model adaptation in multimodal tasks.
arXiv Detail & Related papers (2025-01-06T13:37:13Z) - TIDE: Training Locally Interpretable Domain Generalization Models Enables Test-time Correction [14.396966854171273]
We consider the problem of single-source domain generalization.
Existing methods typically rely on extensive augmentations to synthetically cover diverse domains during training.
We propose an approach that compels models to leverage such local concepts during prediction.
arXiv Detail & Related papers (2024-11-25T08:46:37Z) - FUSE-ing Language Models: Zero-Shot Adapter Discovery for Prompt Optimization Across Tokenizers [55.2480439325792]
We propose FUSE, an approach to approximating an adapter layer that maps from one model's textual embedding space to another, even across different tokenizers.
We show the efficacy of our approach via multi-objective optimization over vision-language and causal language models for image captioning and sentiment-based image captioning.
arXiv Detail & Related papers (2024-08-09T02:16:37Z) - Self-supervised Interpretable Concept-based Models for Text Classification [9.340843984411137]
This paper proposes a self-supervised Interpretable Concept Embedding Models (ICEMs)
We leverage the generalization abilities of Large-Language Models to predict the concepts labels in a self-supervised way.
ICEMs can be trained in a self-supervised way achieving similar performance to fully supervised concept-based models and end-to-end black-box ones.
arXiv Detail & Related papers (2024-06-20T14:04:53Z) - Locally Testing Model Detections for Semantic Global Concepts [3.112979958793927]
We propose a framework for linking global concept encodings to the local processing of single network inputs.
Our approach has the advantage of fully covering the model-internal encoding of the semantic concept.
The results show major differences in the local perception and usage of individual global concept encodings.
arXiv Detail & Related papers (2024-05-27T12:52:45Z) - Improving Intervention Efficacy via Concept Realignment in Concept Bottleneck Models [57.86303579812877]
Concept Bottleneck Models (CBMs) ground image classification on human-understandable concepts to allow for interpretable model decisions.
Existing approaches often require numerous human interventions per image to achieve strong performances.
We introduce a trainable concept realignment intervention module, which leverages concept relations to realign concept assignments post-intervention.
arXiv Detail & Related papers (2024-05-02T17:59:01Z) - MCPNet: An Interpretable Classifier via Multi-Level Concept Prototypes [24.28807025839685]
We argue that explanations lacking insights into the decision processes of low and mid-level features are neither fully faithful nor useful.
We propose a novel paradigm that learns and aligns multi-level concept prototype distributions for classification purposes via Class-aware Concept Distribution (CCD) loss.
arXiv Detail & Related papers (2024-04-13T11:13:56Z) - Learning Transferable Conceptual Prototypes for Interpretable
Unsupervised Domain Adaptation [79.22678026708134]
In this paper, we propose an inherently interpretable method, named Transferable Prototype Learning ( TCPL)
To achieve this goal, we design a hierarchically prototypical module that transfers categorical basic concepts from the source domain to the target domain and learns domain-shared prototypes for explaining the underlying reasoning process.
Comprehensive experiments show that the proposed method can not only provide effective and intuitive explanations but also outperform previous state-of-the-arts.
arXiv Detail & Related papers (2023-10-12T06:36:41Z) - Domain-Agnostic Tuning-Encoder for Fast Personalization of Text-To-Image
Models [59.094601993993535]
Text-to-image (T2I) personalization allows users to combine their own visual concepts in natural language prompts.
Most existing encoders are limited to a single-class domain, which hinders their ability to handle diverse concepts.
We propose a domain-agnostic method that does not require any specialized dataset or prior information about the personalized concepts.
arXiv Detail & Related papers (2023-07-13T17:46:42Z) - Autoregressive Structured Prediction with Language Models [73.11519625765301]
We describe an approach to model structures as sequences of actions in an autoregressive manner with PLMs.
Our approach achieves the new state-of-the-art on all the structured prediction tasks we looked at.
arXiv Detail & Related papers (2022-10-26T13:27:26Z) - Bayesian Prompt Learning for Image-Language Model Generalization [64.50204877434878]
We use the regularization ability of Bayesian methods to frame prompt learning as a variational inference problem.
Our approach regularizes the prompt space, reduces overfitting to the seen prompts and improves the prompt generalization on unseen prompts.
We demonstrate empirically on 15 benchmarks that Bayesian prompt learning provides an appropriate coverage of the prompt space.
arXiv Detail & Related papers (2022-10-05T17:05:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.