CEIR: Concept-based Explainable Image Representation Learning
- URL: http://arxiv.org/abs/2312.10747v1
- Date: Sun, 17 Dec 2023 15:37:41 GMT
- Title: CEIR: Concept-based Explainable Image Representation Learning
- Authors: Yan Cui, Shuhong Liu, Liuzhuozheng Li, Zhiyuan Yuan
- Abstract summary: We introduce Concept-based Explainable Image Representation (CEIR) to derive high-quality representations without label dependency.
Our method exhibits state-of-the-art unsupervised clustering performance on benchmarks such as CIFAR10, CIFAR100, and STL10.
CEIR can seamlessly extract the related concept from open-world images without fine-tuning.
- Score: 0.4198865250277024
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In modern machine learning, the trend of harnessing self-supervised learning
to derive high-quality representations without label dependency has garnered
significant attention. However, the absence of label information, coupled with
the inherently high-dimensional nature, improves the difficulty for the
interpretation of learned representations. Consequently, indirect evaluations
become the popular metric for evaluating the quality of these features, leading
to a biased validation of the learned representation rationale. To address
these challenges, we introduce a novel approach termed Concept-based
Explainable Image Representation (CEIR). Initially, using the Concept-based
Model (CBM) incorporated with pretrained CLIP and concepts generated by GPT-4,
we project input images into a concept vector space. Subsequently, a
Variational Autoencoder (VAE) learns the latent representation from these
projected concepts, which serves as the final image representation. Due to the
capability of the representation to encapsulate high-level, semantically
relevant concepts, the model allows for attributions to a human-comprehensible
concept space. This not only enhances interpretability but also preserves the
robustness essential for downstream tasks. For instance, our method exhibits
state-of-the-art unsupervised clustering performance on benchmarks such as
CIFAR10, CIFAR100, and STL10. Furthermore, capitalizing on the universality of
human conceptual understanding, CEIR can seamlessly extract the related concept
from open-world images without fine-tuning. This offers a fresh approach to
automatic label generation and label manipulation.
Related papers
- Explainable Concept Generation through Vision-Language Preference Learning [7.736445799116692]
Concept-based explanations have become a popular choice for explaining deep neural networks post-hoc.
We devise a reinforcement learning-based preference optimization algorithm that fine-tunes the vision-language generative model.
In addition to showing the efficacy and reliability of our method, we show how our method can be used as a diagnostic tool for analyzing neural networks.
arXiv Detail & Related papers (2024-08-24T02:26:42Z) - Knowledge graphs for empirical concept retrieval [1.06378109904813]
Concept-based explainable AI is promising as a tool to improve the understanding of complex models at the premises of a given user.
Here, we present a workflow for user-driven data collection in both text and image domains.
We test the retrieved concept datasets on two concept-based explainability methods, namely concept activation vectors (CAVs) and concept activation regions (CARs)
arXiv Detail & Related papers (2024-04-10T13:47:22Z) - Visual Concept-driven Image Generation with Text-to-Image Diffusion Model [65.96212844602866]
Text-to-image (TTI) models have demonstrated impressive results in generating high-resolution images of complex scenes.
Recent approaches have extended these methods with personalization techniques that allow them to integrate user-illustrated concepts.
However, the ability to generate images with multiple interacting concepts, such as human subjects, as well as concepts that may be entangled in one, or across multiple, image illustrations remains illusive.
We propose a concept-driven TTI personalization framework that addresses these core challenges.
arXiv Detail & Related papers (2024-02-18T07:28:37Z) - Advancing Ante-Hoc Explainable Models through Generative Adversarial Networks [24.45212348373868]
This paper presents a novel concept learning framework for enhancing model interpretability and performance in visual classification tasks.
Our approach appends an unsupervised explanation generator to the primary classifier network and makes use of adversarial training.
This work presents a significant step towards building inherently interpretable deep vision models with task-aligned concept representations.
arXiv Detail & Related papers (2024-01-09T16:16:16Z) - ACSeg: Adaptive Conceptualization for Unsupervised Semantic Segmentation [17.019848796027485]
Self-supervised visual pre-training models have shown great promise in representing pixel-level semantic relationships.
In this work, we investigate the pixel-level semantic aggregation in self-trained models as image encodes and design concepts.
We propose the Adaptive Concept Generator (ACG) which adaptively maps these prototypes to informative concepts for each image.
arXiv Detail & Related papers (2022-10-12T06:16:34Z) - Visual Concepts Tokenization [65.61987357146997]
We propose an unsupervised transformer-based Visual Concepts Tokenization framework, dubbed VCT, to perceive an image into a set of disentangled visual concept tokens.
To obtain these concept tokens, we only use cross-attention to extract visual information from the image tokens layer by layer without self-attention between concept tokens.
We further propose a Concept Disentangling Loss to facilitate that different concept tokens represent independent visual concepts.
arXiv Detail & Related papers (2022-05-20T11:25:31Z) - Cross-modal Representation Learning for Zero-shot Action Recognition [67.57406812235767]
We present a cross-modal Transformer-based framework, which jointly encodes video data and text labels for zero-shot action recognition (ZSAR)
Our model employs a conceptually new pipeline by which visual representations are learned in conjunction with visual-semantic associations in an end-to-end manner.
Experiment results show our model considerably improves upon the state of the arts in ZSAR, reaching encouraging top-1 accuracy on UCF101, HMDB51, and ActivityNet benchmark datasets.
arXiv Detail & Related papers (2022-05-03T17:39:27Z) - Translational Concept Embedding for Generalized Compositional Zero-shot
Learning [73.60639796305415]
Generalized compositional zero-shot learning means to learn composed concepts of attribute-object pairs in a zero-shot fashion.
This paper introduces a new approach, termed translational concept embedding, to solve these two difficulties in a unified framework.
arXiv Detail & Related papers (2021-12-20T21:27:51Z) - Interactive Disentanglement: Learning Concepts by Interacting with their
Prototype Representations [15.284688801788912]
We show the advantages of prototype representations for understanding and revising the latent space of neural concept learners.
For this purpose, we introduce interactive Concept Swapping Networks (iCSNs)
iCSNs learn to bind conceptual information to specific prototype slots by swapping the latent representations of paired images.
arXiv Detail & Related papers (2021-12-04T09:25:40Z) - Interpretable Visual Reasoning via Induced Symbolic Space [75.95241948390472]
We study the problem of concept induction in visual reasoning, i.e., identifying concepts and their hierarchical relationships from question-answer pairs associated with images.
We first design a new framework named object-centric compositional attention model (OCCAM) to perform the visual reasoning task with object-level visual features.
We then come up with a method to induce concepts of objects and relations using clues from the attention patterns between objects' visual features and question words.
arXiv Detail & Related papers (2020-11-23T18:21:49Z) - Learning Representations by Predicting Bags of Visual Words [55.332200948110895]
Self-supervised representation learning targets to learn convnet-based image representations from unlabeled data.
Inspired by the success of NLP methods in this area, in this work we propose a self-supervised approach based on spatially dense image descriptions.
arXiv Detail & Related papers (2020-02-27T16:45:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.