VLCD: Vision-Language Contrastive Distillation for Accurate and Efficient Automatic Placenta Analysis
- URL: http://arxiv.org/abs/2506.02229v1
- Date: Mon, 02 Jun 2025 20:12:27 GMT
- Title: VLCD: Vision-Language Contrastive Distillation for Accurate and Efficient Automatic Placenta Analysis
- Authors: Manas Mehta, Yimu Pan, Kelly Gallagher, Alison D. Gernand, Jeffery A. Goldstein, Delia Mwinyelle, Leena Mithal, James Z. Wang,
- Abstract summary: Pathological examination of the placenta is an effective method for detecting and mitigating health risks associated with childbirth.<n>Recent advancements in AI have enabled the use of photographs of the placenta and pathology reports for detecting and classifying signs of childbirth-related pathologies.<n>We propose two modifications to vision-language contrastive learning frameworks to enhance their accuracy and efficiency.
- Score: 1.366127486479084
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pathological examination of the placenta is an effective method for detecting and mitigating health risks associated with childbirth. Recent advancements in AI have enabled the use of photographs of the placenta and pathology reports for detecting and classifying signs of childbirth-related pathologies. However, existing automated methods are computationally extensive, which limits their deployability. We propose two modifications to vision-language contrastive learning (VLC) frameworks to enhance their accuracy and efficiency: (1) text-anchored vision-language contrastive knowledge distillation (VLCD)-a new knowledge distillation strategy for medical VLC pretraining, and (2) unsupervised predistillation using a large natural images dataset for improved initialization. Our approach distills efficient neural networks that match or surpass the teacher model in performance while achieving model compression and acceleration. Our results showcase the value of unsupervised predistillation in improving the performance and robustness of our approach, specifically for lower-quality images. VLCD serves as an effective way to improve the efficiency and deployability of medical VLC approaches, making AI-based healthcare solutions more accessible, especially in resource-constrained environments.
Related papers
- Brain Stroke Detection and Classification Using CT Imaging with Transformer Models and Explainable AI [0.0]
This study proposed an artificial intelligence framework for multiclass stroke classification using CT scan images.<n>The proposed method adopted MaxViT, a state-of-the-art Vision Transformer, as the primary deep learning model for image-based stroke classification.<n>To enhance model generalization and address class imbalance, we applied data augmentation techniques.
arXiv Detail & Related papers (2025-07-13T13:50:50Z) - Federated Learning with LoRA Optimized DeiT and Multiscale Patch Embedding for Secure Eye Disease Recognition [2.1358421658740214]
This study introduces a data-efficient image transformer (DeIT)-based approach to advance AI-powered medical imaging and disease detection.<n>It achieves state-of-the-art performance, with the highest AUC, F1 score, precision, minimal loss, and Top-5 accuracy.<n>Grad-CAM++ visualizations improve interpretability by highlighting critical pathological regions, enhancing the model's clinical relevance.
arXiv Detail & Related papers (2025-05-11T13:51:56Z) - The Efficacy of Semantics-Preserving Transformations in Self-Supervised Learning for Medical Ultrasound [60.80780313225093]
This study systematically investigated the impact of data augmentation and preprocessing strategies in self-supervised learning for lung ultrasound.<n>Three data augmentation pipelines were assessed: a baseline pipeline commonly used across imaging domains, a novel semantic-preserving pipeline designed for ultrasound, and a distilled set of the most effective transformations from both pipelines.
arXiv Detail & Related papers (2025-04-10T16:26:47Z) - Predicting Stroke through Retinal Graphs and Multimodal Self-supervised Learning [0.46835339362676565]
Early identification of stroke is crucial for intervention, requiring reliable models.
We proposed an efficient retinal image representation together with clinical information to capture a comprehensive overview of cardiovascular health.
arXiv Detail & Related papers (2024-11-08T14:40:56Z) - IBO: Inpainting-Based Occlusion to Enhance Explainable Artificial Intelligence Evaluation in Histopathology [1.9440228513607511]
Inpainting-Based Occlusion (IBO) is a novel strategy that utilizes a Denoising Diffusion Probabilistic Model to inpaint occluded regions.
We evaluate IBO through two phases: first, by assessing perceptual similarity using the Learned Perceptual Image Patch Similarity (LPIPS) metric, and second, by quantifying the impact on model predictions through Area Under the Curve (AUC) analysis.
arXiv Detail & Related papers (2024-08-29T09:57:55Z) - Classification of Breast Cancer Histopathology Images using a Modified Supervised Contrastive Learning Method [4.303291247305105]
We improve the supervised contrastive learning method by leveraging both image-level labels and domain-specific augmentations to enhance model robustness.
We evaluate our method on the BreakHis dataset, which consists of breast cancer histopathology images.
This improvement corresponds to 93.63% absolute accuracy, highlighting the effectiveness of our approach in leveraging properties of data to learn more appropriate representation space.
arXiv Detail & Related papers (2024-05-06T17:06:11Z) - An Evaluation of Lightweight Deep Learning Techniques in Medical Imaging
for High Precision COVID-19 Diagnostics [0.0]
Decision support systems relax the challenges inherent to the physical examination of images.
Most deep learning algorithms utilised approaches are not amenable to implementation on resource-constrained devices.
This paper presents the development and evaluation of the performance of lightweight deep learning technique for the detection of COVID-19 using the MobileNetV2 model.
arXiv Detail & Related papers (2023-05-30T13:14:03Z) - Vision-Language Modelling For Radiological Imaging and Reports In The
Low Data Regime [70.04389979779195]
This paper explores training medical vision-language models (VLMs) where the visual and language inputs are embedded into a common space.
We explore several candidate methods to improve low-data performance, including adapting generic pre-trained models to novel image and text domains.
Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports.
arXiv Detail & Related papers (2023-03-30T18:20:00Z) - Robust and Efficient Medical Imaging with Self-Supervision [80.62711706785834]
We present REMEDIS, a unified representation learning strategy to improve robustness and data-efficiency of medical imaging AI.
We study a diverse range of medical imaging tasks and simulate three realistic application scenarios using retrospective data.
arXiv Detail & Related papers (2022-05-19T17:34:18Z) - Harmonizing Pathological and Normal Pixels for Pseudo-healthy Synthesis [68.5287824124996]
We present a new type of discriminator, the segmentor, to accurately locate the lesions and improve the visual quality of pseudo-healthy images.
We apply the generated images into medical image enhancement and utilize the enhanced results to cope with the low contrast problem.
Comprehensive experiments on the T2 modality of BraTS demonstrate that the proposed method substantially outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2022-03-29T08:41:17Z) - Performance or Trust? Why Not Both. Deep AUC Maximization with
Self-Supervised Learning for COVID-19 Chest X-ray Classifications [72.52228843498193]
In training deep learning models, a compromise often must be made between performance and trust.
In this work, we integrate a new surrogate loss with self-supervised learning for computer-aided screening of COVID-19 patients.
arXiv Detail & Related papers (2021-12-14T21:16:52Z) - Explaining Clinical Decision Support Systems in Medical Imaging using
Cycle-Consistent Activation Maximization [112.2628296775395]
Clinical decision support using deep neural networks has become a topic of steadily growing interest.
clinicians are often hesitant to adopt the technology because its underlying decision-making process is considered to be intransparent and difficult to comprehend.
We propose a novel decision explanation scheme based on CycleGAN activation which generates high-quality visualizations of classifier decisions even in smaller data sets.
arXiv Detail & Related papers (2020-10-09T14:39:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.