Knowledge-Driven Vision-Language Model for Plexus Detection in Hirschsprung's Disease
- URL: http://arxiv.org/abs/2510.21083v1
- Date: Fri, 24 Oct 2025 01:42:57 GMT
- Title: Knowledge-Driven Vision-Language Model for Plexus Detection in Hirschsprung's Disease
- Authors: Youssef Megahed, Atallah Madi, Dina El Demellawy, Adrian D. C. Chan,
- Abstract summary: Hirschsprung's disease is a congenital absence of ganglion cells in some segment(s) of the colon.<n>Deep learning approaches, such as Convolutional Neural Networks, have performed very well in this task.<n>We propose a novel framework that integrates expert-derived textual concepts into a vision-language model to guide plexus classification.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hirschsprung's disease is defined as the congenital absence of ganglion cells in some segment(s) of the colon. The muscle cannot make coordinated movements to propel stool in that section, most commonly leading to obstruction. The diagnosis and treatment for this disease require a clear identification of different region(s) of the myenteric plexus, where ganglion cells should be present, on the microscopic view of the tissue slide. While deep learning approaches, such as Convolutional Neural Networks, have performed very well in this task, they are often treated as black boxes, with minimal understanding gained from them, and may not conform to how a physician makes decisions. In this study, we propose a novel framework that integrates expert-derived textual concepts into a Contrastive Language-Image Pre-training-based vision-language model to guide plexus classification. Using prompts derived from expert sources (e.g., medical textbooks and papers) generated by large language models and reviewed by our team before being encoded with QuiltNet, our approach aligns clinically relevant semantic cues with visual features. Experimental results show that the proposed model demonstrated superior discriminative capability across different classification metrics as it outperformed CNN-based models, including VGG-19, ResNet-18, and ResNet-50; achieving an accuracy of 83.9%, a precision of 86.6%, and a specificity of 87.6%. These findings highlight the potential of multi-modal learning in histopathology and underscore the value of incorporating expert knowledge for more clinically relevant model outputs.
Related papers
- From Classification to Cross-Modal Understanding: Leveraging Vision-Language Models for Fine-Grained Renal Pathology [9.268389327736735]
We model fine-grained glomerular subtyping as a clinically realistic few-shot problem.<n>We evaluate both pathology-specialized and general-purpose vision-language models under this setting.
arXiv Detail & Related papers (2025-11-15T01:44:11Z) - Fusion-Based Brain Tumor Classification Using Deep Learning and Explainable AI, and Rule-Based Reasoning [0.0]
This study presents an ensemble-based deep learning framework that combines MobileNetV2 and DenseNet121 convolutional neural networks (CNNs)<n>The models were trained and evaluated on the Figshare dataset using a stratified 5-fold cross-validation protocol.<n>The ensemble achieved superior performance compared to individual CNNs, with an accuracy of 91.7%, precision of 91.9%, recall of 91.7%, and F1-score of 91.6%.
arXiv Detail & Related papers (2025-08-09T08:46:36Z) - Comparison of ConvNeXt and Vision-Language Models for Breast Density Assessment in Screening Mammography [39.58317527488534]
This study compares multimodal and CNN-based methods for automated classification using the BI-RADS system.<n>Zero-shot classification achieved modest performance, while the fine-tuned ConvNeXt model outperformed the BioMedCLIP linear probe.<n>These findings suggest that despite the promise of multimodal learning, CNN-based models with end-to-end fine-tuning provide stronger performance for specialized medical imaging.
arXiv Detail & Related papers (2025-06-16T20:14:37Z) - Aiding Medical Diagnosis through Image Synthesis and Classification [0.0]
This paper presents a system designed to generate realistic medical images from textual descriptions.<n>A pretrained stable diffusion model was fine-tuned using Low-Rank Adaptation (LoRA) on the PathMNIST dataset.<n>A ResNet-18 classification model was trained on the same dataset, achieving 99.76% accuracy in detecting the correct label.
arXiv Detail & Related papers (2025-06-01T02:25:43Z) - A Knowledge-enhanced Pathology Vision-language Foundation Model for Cancer Diagnosis [58.85247337449624]
We propose a knowledge-enhanced vision-language pre-training approach that integrates disease knowledge into the alignment within hierarchical semantic groups.<n>KEEP achieves state-of-the-art performance in zero-shot cancer diagnostic tasks.
arXiv Detail & Related papers (2024-12-17T17:45:21Z) - Robust and Interpretable Medical Image Classifiers via Concept
Bottleneck Models [49.95603725998561]
We propose a new paradigm to build robust and interpretable medical image classifiers with natural language concepts.
Specifically, we first query clinical concepts from GPT-4, then transform latent image features into explicit concepts with a vision-language model.
arXiv Detail & Related papers (2023-10-04T21:57:09Z) - Medulloblastoma Tumor Classification using Deep Transfer Learning with
Multi-Scale EfficientNets [63.62764375279861]
We propose an end-to-end MB tumor classification and explore transfer learning with various input sizes and matching network dimensions.
Using a data set with 161 cases, we demonstrate that pre-trained EfficientNets with larger input resolutions lead to significant performance improvements.
arXiv Detail & Related papers (2021-09-10T13:07:11Z) - Acute Lymphoblastic Leukemia Detection from Microscopic Images Using
Weighted Ensemble of Convolutional Neural Networks [4.095759108304108]
This article has automated the ALL detection task from microscopic cell images, employing deep Convolutional Neural Networks (CNNs)
Various data augmentations and pre-processing are incorporated for achieving a better generalization of the network.
Our proposed weighted ensemble model, using the kappa values of the ensemble candidates as their weights, has outputted a weighted F1-score of 88.6 %, a balanced accuracy of 86.2 %, and an AUC of 0.941 in the preliminary test set.
arXiv Detail & Related papers (2021-05-09T18:58:48Z) - Relational Subsets Knowledge Distillation for Long-tailed Retinal
Diseases Recognition [65.77962788209103]
We propose class subset learning by dividing the long-tailed data into multiple class subsets according to prior knowledge.
It enforces the model to focus on learning the subset-specific knowledge.
The proposed framework proved to be effective for the long-tailed retinal diseases recognition task.
arXiv Detail & Related papers (2021-04-22T13:39:33Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z) - Weakly supervised multiple instance learning histopathological tumor
segmentation [51.085268272912415]
We propose a weakly supervised framework for whole slide imaging segmentation.
We exploit a multiple instance learning scheme for training models.
The proposed framework has been evaluated on multi-locations and multi-centric public data from The Cancer Genome Atlas and the PatchCamelyon dataset.
arXiv Detail & Related papers (2020-04-10T13:12:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.