Cross-Modal Conceptualization in Bottleneck Models
- URL: http://arxiv.org/abs/2310.14805v2
- Date: Sun, 17 Dec 2023 09:40:52 GMT
- Title: Cross-Modal Conceptualization in Bottleneck Models
- Authors: Danis Alukaev, Semen Kiselev, Ilya Pershin, Bulat Ibragimov, Vladimir
Ivanov, Alexey Kornaev, Ivan Titov
- Abstract summary: Concept Bottleneck Models (CBMs) assume that training examples (e.g., x-ray images) are annotated with high-level concepts.
In our approach, we adopt a more moderate assumption and instead use text descriptions, accompanying the images in training, to guide the induction of concepts.
Our cross-modal approach treats concepts as discrete latent variables and promotes concepts that (1) are predictive of the label, and (2) can be predicted reliably from both the image and text.
- Score: 21.2577097041883
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Concept Bottleneck Models (CBMs) assume that training examples (e.g., x-ray
images) are annotated with high-level concepts (e.g., types of abnormalities),
and perform classification by first predicting the concepts, followed by
predicting the label relying on these concepts. The main difficulty in using
CBMs comes from having to choose concepts that are predictive of the label and
then having to label training examples with these concepts. In our approach, we
adopt a more moderate assumption and instead use text descriptions (e.g.,
radiology reports), accompanying the images in training, to guide the induction
of concepts. Our cross-modal approach treats concepts as discrete latent
variables and promotes concepts that (1) are predictive of the label, and (2)
can be predicted reliably from both the image and text. Through experiments
conducted on datasets ranging from synthetic datasets (e.g., synthetic images
with generated descriptions) to realistic medical imaging datasets, we
demonstrate that cross-modal learning encourages the induction of interpretable
concepts while also facilitating disentanglement. Our results also suggest that
this guidance leads to increased robustness by suppressing the reliance on
shortcut features.
Related papers
- CusConcept: Customized Visual Concept Decomposition with Diffusion Models [13.95568624067449]
We propose a two-stage framework, CusConcept, to extract customized visual concept embedding vectors.
In the first stage, CusConcept employs a vocabularies-guided concept decomposition mechanism.
In the second stage, joint concept refinement is performed to enhance the fidelity and quality of generated images.
arXiv Detail & Related papers (2024-10-01T04:41:44Z) - Non-confusing Generation of Customized Concepts in Diffusion Models [135.4385383284657]
We tackle the common challenge of inter-concept visual confusion in compositional concept generation using text-guided diffusion models (TGDMs)
Existing customized generation methods only focus on fine-tuning the second stage while overlooking the first one.
We propose a simple yet effective solution called CLIF: contrastive image-language fine-tuning.
arXiv Detail & Related papers (2024-05-11T05:01:53Z) - Knowledge graphs for empirical concept retrieval [1.06378109904813]
Concept-based explainable AI is promising as a tool to improve the understanding of complex models at the premises of a given user.
Here, we present a workflow for user-driven data collection in both text and image domains.
We test the retrieved concept datasets on two concept-based explainability methods, namely concept activation vectors (CAVs) and concept activation regions (CARs)
arXiv Detail & Related papers (2024-04-10T13:47:22Z) - Visual Concept-driven Image Generation with Text-to-Image Diffusion Model [65.96212844602866]
Text-to-image (TTI) models have demonstrated impressive results in generating high-resolution images of complex scenes.
Recent approaches have extended these methods with personalization techniques that allow them to integrate user-illustrated concepts.
However, the ability to generate images with multiple interacting concepts, such as human subjects, as well as concepts that may be entangled in one, or across multiple, image illustrations remains illusive.
We propose a concept-driven TTI personalization framework that addresses these core challenges.
arXiv Detail & Related papers (2024-02-18T07:28:37Z) - MICA: Towards Explainable Skin Lesion Diagnosis via Multi-Level
Image-Concept Alignment [4.861768967055006]
We propose a multi-modal explainable disease diagnosis framework that meticulously aligns medical images and clinical-related concepts semantically at multiple strata.
Our method, while preserving model interpretability, attains high performance and label efficiency for concept detection and disease diagnosis.
arXiv Detail & Related papers (2024-01-16T17:45:01Z) - CEIR: Concept-based Explainable Image Representation Learning [0.4198865250277024]
We introduce Concept-based Explainable Image Representation (CEIR) to derive high-quality representations without label dependency.
Our method exhibits state-of-the-art unsupervised clustering performance on benchmarks such as CIFAR10, CIFAR100, and STL10.
CEIR can seamlessly extract the related concept from open-world images without fine-tuning.
arXiv Detail & Related papers (2023-12-17T15:37:41Z) - Improving Image Captioning via Predicting Structured Concepts [46.88858655641866]
We propose a structured concept predictor to predict concepts and their structures, then we integrate them into captioning.
We design weighted graph convolutional networks (W-GCN) to depict concept relations driven by word dependencies.
Our approach captures potential relations among concepts and discriminatively learns different concepts, so that effectively facilitates image captioning with inherited information.
arXiv Detail & Related papers (2023-11-14T15:01:58Z) - Interpreting Pretrained Language Models via Concept Bottlenecks [55.47515772358389]
Pretrained language models (PLMs) have made significant strides in various natural language processing tasks.
The lack of interpretability due to their black-box'' nature poses challenges for responsible implementation.
We propose a novel approach to interpreting PLMs by employing high-level, meaningful concepts that are easily understandable for humans.
arXiv Detail & Related papers (2023-11-08T20:41:18Z) - Implicit Concept Removal of Diffusion Models [92.55152501707995]
Text-to-image (T2I) diffusion models often inadvertently generate unwanted concepts such as watermarks and unsafe images.
We present the Geom-Erasing, a novel concept removal method based on the geometric-driven control.
arXiv Detail & Related papers (2023-10-09T17:13:10Z) - ConceptBed: Evaluating Concept Learning Abilities of Text-to-Image
Diffusion Models [79.10890337599166]
We introduce ConceptBed, a large-scale dataset that consists of 284 unique visual concepts and 33K composite text prompts.
We evaluate visual concepts that are either objects, attributes, or styles, and also evaluate four dimensions of compositionality: counting, attributes, relations, and actions.
Our results point to a trade-off between learning the concepts and preserving the compositionality which existing approaches struggle to overcome.
arXiv Detail & Related papers (2023-06-07T18:00:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.