Identifying Interpretable Subspaces in Image Representations
- URL: http://arxiv.org/abs/2307.10504v2
- Date: Thu, 7 Sep 2023 18:46:08 GMT
- Title: Identifying Interpretable Subspaces in Image Representations
- Authors: Neha Kalibhat, Shweta Bhardwaj, Bayan Bruss, Hamed Firooz, Maziar
Sanjabi, Soheil Feizi
- Abstract summary: We propose a framework to explain features of image representations using Contrasting Concepts (FALCON)
For a target feature, FALCON captions its highly activating cropped images using a large captioning dataset and a pre-trained vision-language model like CLIP.
Each word among the captions is scored and ranked leading to a small number of shared, human-understandable concepts.
- Score: 54.821222487956355
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We propose Automatic Feature Explanation using Contrasting Concepts (FALCON),
an interpretability framework to explain features of image representations. For
a target feature, FALCON captions its highly activating cropped images using a
large captioning dataset (like LAION-400m) and a pre-trained vision-language
model like CLIP. Each word among the captions is scored and ranked leading to a
small number of shared, human-understandable concepts that closely describe the
target feature. FALCON also applies contrastive interpretation using lowly
activating (counterfactual) images, to eliminate spurious concepts. Although
many existing approaches interpret features independently, we observe in
state-of-the-art self-supervised and supervised models, that less than 20% of
the representation space can be explained by individual features. We show that
features in larger spaces become more interpretable when studied in groups and
can be explained with high-order scoring concepts through FALCON. We discuss
how extracted concepts can be used to explain and debug failures in downstream
tasks. Finally, we present a technique to transfer concepts from one
(explainable) representation space to another unseen representation space by
learning a simple linear transformation. Code available at
https://github.com/NehaKalibhat/falcon-explain.
Related papers
- Compositional Entailment Learning for Hyperbolic Vision-Language Models [54.41927525264365]
We show how to fully leverage the innate hierarchical nature of hyperbolic embeddings by looking beyond individual image-text pairs.
We propose Compositional Entailment Learning for hyperbolic vision-language models.
Empirical evaluation on a hyperbolic vision-language model trained with millions of image-text pairs shows that the proposed compositional learning approach outperforms conventional Euclidean CLIP learning.
arXiv Detail & Related papers (2024-10-09T14:12:50Z) - Disentangling Dense Embeddings with Sparse Autoencoders [0.0]
Sparse autoencoders (SAEs) have shown promise in extracting interpretable features from complex neural networks.
We present one of the first applications of SAEs to dense text embeddings from large language models.
We show that the resulting sparse representations maintain semantic fidelity while offering interpretability.
arXiv Detail & Related papers (2024-08-01T15:46:22Z) - Rewrite Caption Semantics: Bridging Semantic Gaps for
Language-Supervised Semantic Segmentation [100.81837601210597]
We propose Concept Curation (CoCu) to bridge the gap between visual and textual semantics in pre-training data.
CoCu achieves superb zero-shot transfer performance and greatly boosts language-supervised segmentation baseline by a large margin.
arXiv Detail & Related papers (2023-09-24T00:05:39Z) - Cross-Modal Concept Learning and Inference for Vision-Language Models [31.463771883036607]
In existing fine-tuning methods, the class-specific text description is matched against the whole image.
We develop a new method called cross-model concept learning and inference (CCLI)
Our method automatically learns a large set of distinctive visual concepts from images using a set of semantic text concepts.
arXiv Detail & Related papers (2023-07-28T10:26:28Z) - Visual Superordinate Abstraction for Robust Concept Learning [80.15940996821541]
Concept learning constructs visual representations that are connected to linguistic semantics.
We ascribe the bottleneck to a failure of exploring the intrinsic semantic hierarchy of visual concepts.
We propose a visual superordinate abstraction framework for explicitly modeling semantic-aware visual subspaces.
arXiv Detail & Related papers (2022-05-28T14:27:38Z) - FALCON: Fast Visual Concept Learning by Integrating Images, Linguistic
descriptions, and Conceptual Relations [99.54048050189971]
We present a framework for learning new visual concepts quickly, guided by multiple naturally occurring data streams.
The learned concepts support downstream applications, such as answering questions by reasoning about unseen images.
We demonstrate the effectiveness of our model on both synthetic and real-world datasets.
arXiv Detail & Related papers (2022-03-30T19:45:00Z) - Learning Representations by Predicting Bags of Visual Words [55.332200948110895]
Self-supervised representation learning targets to learn convnet-based image representations from unlabeled data.
Inspired by the success of NLP methods in this area, in this work we propose a self-supervised approach based on spatially dense image descriptions.
arXiv Detail & Related papers (2020-02-27T16:45:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.