TextCAVs: Debugging vision models using text
- URL: http://arxiv.org/abs/2408.08652v1
- Date: Fri, 16 Aug 2024 10:36:08 GMT
- Title: TextCAVs: Debugging vision models using text
- Authors: Angus Nicolson, Yarin Gal, J. Alison Noble,
- Abstract summary: We introduce TextCAVs: a novel method which creates concept activation vectors (CAVs) using text descriptions of the concept.
In early experimental results, we demonstrate that TextCAVs produces reasonable explanations for a chest x-ray dataset (MIMIC-CXR) and natural images (ImageNet)
- Score: 37.4673705484723
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Concept-based interpretability methods are a popular form of explanation for deep learning models which provide explanations in the form of high-level human interpretable concepts. These methods typically find concept activation vectors (CAVs) using a probe dataset of concept examples. This requires labelled data for these concepts -- an expensive task in the medical domain. We introduce TextCAVs: a novel method which creates CAVs using vision-language models such as CLIP, allowing for explanations to be created solely using text descriptions of the concept, as opposed to image exemplars. This reduced cost in testing concepts allows for many concepts to be tested and for users to interact with the model, testing new ideas as they are thought of, rather than a delay caused by image collection and annotation. In early experimental results, we demonstrate that TextCAVs produces reasonable explanations for a chest x-ray dataset (MIMIC-CXR) and natural images (ImageNet), and that these explanations can be used to debug deep learning-based models.
Related papers
- Exploiting Text-Image Latent Spaces for the Description of Visual Concepts [13.287533148600248]
Concept Activation Vectors (CAVs) offer insights into neural network decision-making by linking human friendly concepts to the model's internal feature extraction process.
When a new set of CAVs is discovered, they must still be translated into a human understandable description.
We propose an approach to aid the interpretation of newly discovered concept sets by suggesting textual descriptions for each CAV.
arXiv Detail & Related papers (2024-10-23T12:51:07Z) - Explainable Concept Generation through Vision-Language Preference Learning [7.736445799116692]
Concept-based explanations have become a popular choice for explaining deep neural networks post-hoc.
We devise a reinforcement learning-based preference optimization algorithm that fine-tunes the vision-language generative model.
In addition to showing the efficacy and reliability of our method, we show how our method can be used as a diagnostic tool for analyzing neural networks.
arXiv Detail & Related papers (2024-08-24T02:26:42Z) - Discover-then-Name: Task-Agnostic Concept Bottlenecks via Automated Concept Discovery [52.498055901649025]
Concept Bottleneck Models (CBMs) have been proposed to address the 'black-box' problem of deep neural networks.
We propose a novel CBM approach -- called Discover-then-Name-CBM (DN-CBM) -- that inverts the typical paradigm.
Our concept extraction strategy is efficient, since it is agnostic to the downstream task, and uses concepts already known to the model.
arXiv Detail & Related papers (2024-07-19T17:50:11Z) - Explainable Image Captioning using CNN- CNN architecture and Hierarchical Attention [0.0]
Explainable AI is an approach where a conventional method is approached in a way that the model or the algorithm's predictions can be explainable and justifiable.
A newer architecture with a CNN decoder and hierarchical attention concept has been used to increase speed and accuracy of caption generation.
arXiv Detail & Related papers (2024-06-28T16:27:47Z) - Explaining Explainability: Understanding Concept Activation Vectors [35.37586279472797]
Recent interpretability methods propose using concept-based explanations to translate internal representations of deep learning models into a language that humans are familiar with: concepts.
This requires understanding which concepts are present in the representation space of a neural network.
In this work, we investigate three properties of Concept Activation Vectors (CAVs), which are learnt using a probe dataset of concept exemplars.
We introduce tools designed to detect the presence of these properties, provide insight into how they affect the derived explanations, and provide recommendations to minimise their impact.
arXiv Detail & Related papers (2024-04-04T17:46:20Z) - Advancing Visual Grounding with Scene Knowledge: Benchmark and Method [74.72663425217522]
Visual grounding (VG) aims to establish fine-grained alignment between vision and language.
Most existing VG datasets are constructed using simple description texts.
We propose a novel benchmark of underlineScene underlineKnowledge-guided underlineVisual underlineGrounding.
arXiv Detail & Related papers (2023-07-21T13:06:02Z) - DisCLIP: Open-Vocabulary Referring Expression Generation [37.789850573203694]
We build on CLIP, a large-scale visual-semantic model, to guide an LLM to generate a contextual description of a target concept in an image.
We measure the quality of the generated text by evaluating the capability of a receiver model to accurately identify the described object within the scene.
Our results highlight the potential of using pre-trained visual-semantic models for generating high-quality contextual descriptions.
arXiv Detail & Related papers (2023-05-30T15:13:17Z) - Text-To-Concept (and Back) via Cross-Model Alignment [48.133333356834186]
We show that mapping between an image's representation in one model to its representation in another can be learned surprisingly well with just a linear layer.
We convert fixed off-the-shelf vision encoders to surprisingly strong zero-shot classifiers for free.
We show other immediate use-cases of text-to-concept, like building concept bottleneck models with no concept supervision.
arXiv Detail & Related papers (2023-05-10T18:01:06Z) - Unleashing Text-to-Image Diffusion Models for Visual Perception [84.41514649568094]
VPD (Visual Perception with a pre-trained diffusion model) is a new framework that exploits the semantic information of a pre-trained text-to-image diffusion model in visual perception tasks.
We show that VPD can be faster adapted to downstream visual perception tasks using the proposed VPD.
arXiv Detail & Related papers (2023-03-03T18:59:47Z) - DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting [91.56988987393483]
We present a new framework for dense prediction by implicitly and explicitly leveraging the pre-trained knowledge from CLIP.
Specifically, we convert the original image-text matching problem in CLIP to a pixel-text matching problem and use the pixel-text score maps to guide the learning of dense prediction models.
Our method is model-agnostic, which can be applied to arbitrary dense prediction systems and various pre-trained visual backbones.
arXiv Detail & Related papers (2021-12-02T18:59:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.