Related papers: Interpreting and Controlling Vision Foundation Models via Text Explanations

Interpreting and Controlling Vision Foundation Models via Text Explanations

URL: http://arxiv.org/abs/2310.10591v1
Date: Mon, 16 Oct 2023 17:12:06 GMT
Title: Interpreting and Controlling Vision Foundation Models via Text Explanations
Authors: Haozhe Chen, Junfeng Yang, Carl Vondrick, Chengzhi Mao
Abstract summary: We present a framework for interpreting vision transformer's latent tokens with natural language. Our approach enables understanding of model visual reasoning procedure without needing additional model training or data collection.
Score: 45.30541722925515
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large-scale pre-trained vision foundation models, such as CLIP, have become de facto backbones for various vision tasks. However, due to their black-box nature, understanding the underlying rules behind these models' predictions and controlling model behaviors have remained open challenges. We present a framework for interpreting vision transformer's latent tokens with natural language. Given a latent token, our framework retains its semantic information to the final layer using transformer's local operations and retrieves the closest text for explanation. Our approach enables understanding of model visual reasoning procedure without needing additional model training or data collection. Based on the obtained interpretations, our framework allows for model editing that controls model reasoning behaviors and improves model robustness against biases and spurious correlations.

Related papers

Sparse Autoencoders for Scientifically Rigorous Interpretation of Vision Models [27.806966289284528]
We present a unified framework using sparse autoencoders (SAEs) to discover human-interpretable visual features. We show that SAEs can reliably identify and manipulate interpretable visual features without model re-training.
arXiv Detail & Related papers (2025-02-10T18:32:41Z)
Language Model as Visual Explainer [72.88137795439407]
We present a systematic approach for interpreting vision models using a tree-structured linguistic explanation. Our method provides human-understandable explanations in the form of attribute-laden trees. To access the effectiveness of our approach, we introduce new benchmarks and conduct rigorous evaluations.
arXiv Detail & Related papers (2024-12-08T20:46:23Z)
Interpret the Internal States of Recommendation Model with Sparse Autoencoder [28.234859617081295]
RecSAE is an automated and generalizable probing framework that interprets Recommenders with Sparse AutoEncoder.<n>It extracts interpretable latents from the internal states of recommendation models and links them to semantic concepts for interpretation.<n> RecSAE does not alter original models during interpretation and also enables targeted de-biasing to models based on interpreted results.
arXiv Detail & Related papers (2024-11-09T08:22:31Z)
Enforcing Interpretability in Time Series Transformers: A Concept Bottleneck Framework [2.8470354623829577]
We develop a framework based on Concept Bottleneck Models to enforce interpretability of time series Transformers. We modify the training objective to encourage a model to develop representations similar to predefined interpretable concepts. We find that the model performance remains mostly unaffected, while the model shows much improved interpretability.
arXiv Detail & Related papers (2024-10-08T14:22:40Z)
Explanatory Model Monitoring to Understand the Effects of Feature Shifts on Performance [61.06245197347139]
We propose a novel approach to explain the behavior of a black-box model under feature shifts. We refer to our method that combines concepts from Optimal Transport and Shapley Values as Explanatory Performance Estimation.
arXiv Detail & Related papers (2024-08-24T18:28:19Z)
Self-supervised Interpretable Concept-based Models for Text Classification [9.340843984411137]
This paper proposes a self-supervised Interpretable Concept Embedding Models (ICEMs) We leverage the generalization abilities of Large-Language Models to predict the concepts labels in a self-supervised way. ICEMs can be trained in a self-supervised way achieving similar performance to fully supervised concept-based models and end-to-end black-box ones.
arXiv Detail & Related papers (2024-06-20T14:04:53Z)
Data-efficient Large Vision Models through Sequential Autoregression [58.26179273091461]
We develop an efficient, autoregression-based vision model on a limited dataset. We demonstrate how this model achieves proficiency in a spectrum of visual tasks spanning both high-level and low-level semantic understanding. Our empirical evaluations underscore the model's agility in adapting to various tasks, heralding a significant reduction in the parameter footprint.
arXiv Detail & Related papers (2024-02-07T13:41:53Z)
Explainability for Large Language Models: A Survey [59.67574757137078]
Large language models (LLMs) have demonstrated impressive capabilities in natural language processing. This paper introduces a taxonomy of explainability techniques and provides a structured overview of methods for explaining Transformer-based language models.
arXiv Detail & Related papers (2023-09-02T22:14:26Z)
Foundational Models Defining a New Era in Vision: A Survey and Outlook [151.49434496615427]
Vision systems to see and reason about the compositional nature of visual scenes are fundamental to understanding our world. The models learned to bridge the gap between such modalities coupled with large-scale training data facilitate contextual reasoning, generalization, and prompt capabilities at test time. The output of such models can be modified through human-provided prompts without retraining, e.g., segmenting a particular object by providing a bounding box, having interactive dialogues by asking questions about an image or video scene or manipulating the robot's behavior through language instructions.
arXiv Detail & Related papers (2023-07-25T17:59:18Z)
Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP) What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining. How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z)
Influence Tuning: Demoting Spurious Correlations via Instance Attribution and Instance-Driven Updates [26.527311287924995]
influence tuning can help deconfounding the model from spurious patterns in data. We show that in a controlled setup, influence tuning can help deconfounding the model from spurious patterns in data.
arXiv Detail & Related papers (2021-10-07T06:59:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.