CBVLM: Training-free Explainable Concept-based Large Vision Language Models for Medical Image Classification
- URL: http://arxiv.org/abs/2501.12266v1
- Date: Tue, 21 Jan 2025 16:38:04 GMT
- Title: CBVLM: Training-free Explainable Concept-based Large Vision Language Models for Medical Image Classification
- Authors: Cristiano Patrício, Isabel Rio-Torto, Jaime S. Cardoso, Luís F. Teixeira, João C. Neves,
- Abstract summary: Concept Bottleneck Models (CBMs) tackle the latter by constraining the final disease prediction on a set of predefined and human-interpretable concepts.
We propose a simple, yet effective, methodology, CBVLM, which tackles both of the aforementioned challenges.
By grounding the final diagnosis on the predicted concepts, we ensure explainability, and by leveraging the few-shot capabilities of LVLMs, we drastically lower the annotation cost.
- Score: 8.470147509053819
- License:
- Abstract: The main challenges limiting the adoption of deep learning-based solutions in medical workflows are the availability of annotated data and the lack of interpretability of such systems. Concept Bottleneck Models (CBMs) tackle the latter by constraining the final disease prediction on a set of predefined and human-interpretable concepts. However, the increased interpretability achieved through these concept-based explanations implies a higher annotation burden. Moreover, if a new concept needs to be added, the whole system needs to be retrained. Inspired by the remarkable performance shown by Large Vision-Language Models (LVLMs) in few-shot settings, we propose a simple, yet effective, methodology, CBVLM, which tackles both of the aforementioned challenges. First, for each concept, we prompt the LVLM to answer if the concept is present in the input image. Then, we ask the LVLM to classify the image based on the previous concept predictions. Moreover, in both stages, we incorporate a retrieval module responsible for selecting the best examples for in-context learning. By grounding the final diagnosis on the predicted concepts, we ensure explainability, and by leveraging the few-shot capabilities of LVLMs, we drastically lower the annotation cost. We validate our approach with extensive experiments across four medical datasets and twelve LVLMs (both generic and medical) and show that CBVLM consistently outperforms CBMs and task-specific supervised methods without requiring any training and using just a few annotated examples. More information on our project page: https://cristianopatricio.github.io/CBVLM/.
Related papers
- V2C-CBM: Building Concept Bottlenecks with Vision-to-Concept Tokenizer [19.177297480709512]
Concept Bottleneck Models (CBMs) offer inherent interpretability by translating images into human-comprehensible concepts.
Recent approaches have leveraged the knowledge of large language models to construct concept bottlenecks.
In this study, we investigate to avoid these issues by constructing CBMs directly from multimodal models.
arXiv Detail & Related papers (2025-01-09T05:12:38Z) - A Two-Step Concept-Based Approach for Enhanced Interpretability and Trust in Skin Lesion Diagnosis [6.6635650150737815]
Concept Bottleneck Models (CBMs) offer inherent interpretability by constraining the final disease prediction on a set of human-understandable concepts.
We introduce a novel two-step methodology that addresses both of these challenges.
By simulating the two stages of a CBM, we utilize a pretrained Vision Language Model (VLM) to automatically predict clinical concepts, and a Large Language Model (LLM) to generate disease diagnoses.
arXiv Detail & Related papers (2024-11-08T14:52:42Z) - Improving Concept Alignment in Vision-Language Concept Bottleneck Models [9.228586820098723]
Concept Bottleneck Models (CBM) map images to human-interpretable concepts before making class predictions.
Recent approaches automate CBM construction by prompting Large Language Models (LLMs) to generate text concepts.
It is desired to build CBMs with concepts defined by human experts rather than LLM-generated ones.
arXiv Detail & Related papers (2024-05-03T03:02:00Z) - XCoOp: Explainable Prompt Learning for Computer-Aided Diagnosis via Concept-guided Context Optimization [4.634780391920529]
We propose a novel explainable prompt learning framework that leverages medical knowledge by aligning the semantics of images, learnable prompts, and clinical concept-driven prompts.
Our framework addresses the lack of valuable concept annotations by eliciting knowledge from large language models.
Our method simultaneously achieves superior diagnostic performance, flexibility, and interpretability, shedding light on the effectiveness of foundation models in facilitating XAI.
arXiv Detail & Related papers (2024-03-14T14:02:01Z) - Finer: Investigating and Enhancing Fine-Grained Visual Concept Recognition in Large Vision Language Models [57.95366341738857]
In-depth analyses show that instruction-tuned LVLMs exhibit modality gap, showing discrepancy when given textual and visual inputs that correspond to the same concept.
We propose a multiple attribute-centric evaluation benchmark, Finer, to evaluate LVLMs' fine-grained visual comprehension ability and provide significantly improved explainability.
arXiv Detail & Related papers (2024-02-26T05:43:51Z) - Interpreting Pretrained Language Models via Concept Bottlenecks [55.47515772358389]
Pretrained language models (PLMs) have made significant strides in various natural language processing tasks.
The lack of interpretability due to their black-box'' nature poses challenges for responsible implementation.
We propose a novel approach to interpreting PLMs by employing high-level, meaningful concepts that are easily understandable for humans.
arXiv Detail & Related papers (2023-11-08T20:41:18Z) - Towards Concept-Aware Large Language Models [56.48016300758356]
Concepts play a pivotal role in various human cognitive functions, including learning, reasoning and communication.
There is very little work on endowing machines with the ability to form and reason with concepts.
In this work, we analyze how well contemporary large language models (LLMs) capture human concepts and their structure.
arXiv Detail & Related papers (2023-11-03T12:19:22Z) - See, Think, Confirm: Interactive Prompting Between Vision and Language
Models for Knowledge-based Visual Reasoning [60.43585179885355]
We propose a novel framework named Interactive Prompting Visual Reasoner (IPVR) for few-shot knowledge-based visual reasoning.
IPVR contains three stages, see, think and confirm.
We conduct experiments on a range of knowledge-based visual reasoning datasets.
arXiv Detail & Related papers (2023-01-12T18:59:50Z) - Collaboration of Pre-trained Models Makes Better Few-shot Learner [49.89134194181042]
Few-shot classification requires deep neural networks to learn generalized representations only from limited training images.
Recently, CLIP-based methods have shown promising few-shot performance benefited from the contrastive language-image pre-training.
We propose CoMo, a Collaboration of pre-trained Models that incorporates diverse prior knowledge from various pre-training paradigms for better few-shot learning.
arXiv Detail & Related papers (2022-09-25T16:23:12Z) - A Competence-aware Curriculum for Visual Concepts Learning via Question
Answering [95.35905804211698]
We propose a competence-aware curriculum for visual concept learning in a question-answering manner.
We design a neural-symbolic concept learner for learning the visual concepts and a multi-dimensional Item Response Theory (mIRT) model for guiding the learning process.
Experimental results on CLEVR show that with a competence-aware curriculum, the proposed method achieves state-of-the-art performances.
arXiv Detail & Related papers (2020-07-03T05:08:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.