Explainable, Multi-modal Wound Infection Classification from Images Augmented with Generated Captions
- URL: http://arxiv.org/abs/2502.20277v1
- Date: Thu, 27 Feb 2025 17:04:00 GMT
- Title: Explainable, Multi-modal Wound Infection Classification from Images Augmented with Generated Captions
- Authors: Palawat Busaranuvong, Emmanuel Agu, Reza Saadati Fard, Deepak Kumar, Shefalika Gautam, Bengisu Tulu, Diane Strong,
- Abstract summary: Infections in Diabetic Foot Ulcers (DFUs) can cause severe complications, including tissue death and limb amputation.<n>Previous machine learning methods have focused on identifying infections by analyzing wound images alone.<n>In this study, we aim to improve infection detection by introducing Synthetic Caption Augmented Retrieval for Wound Infection Detection.
- Score: 2.4548085068515286
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Infections in Diabetic Foot Ulcers (DFUs) can cause severe complications, including tissue death and limb amputation, highlighting the need for accurate, timely diagnosis. Previous machine learning methods have focused on identifying infections by analyzing wound images alone, without utilizing additional metadata such as medical notes. In this study, we aim to improve infection detection by introducing Synthetic Caption Augmented Retrieval for Wound Infection Detection (SCARWID), a novel deep learning framework that leverages synthetic textual descriptions to augment DFU images. SCARWID consists of two components: (1) Wound-BLIP, a Vision-Language Model (VLM) fine-tuned on GPT-4o-generated descriptions to synthesize consistent captions from images; and (2) an Image-Text Fusion module that uses cross-attention to extract cross-modal embeddings from an image and its corresponding Wound-BLIP caption. Infection status is determined by retrieving the top-k similar items from a labeled support set. To enhance the diversity of training data, we utilized a latent diffusion model to generate additional wound images. As a result, SCARWID outperformed state-of-the-art models, achieving average sensitivity, specificity, and accuracy of 0.85, 0.78, and 0.81, respectively, for wound infection classification. Displaying the generated captions alongside the wound images and infection detection results enhances interpretability and trust, enabling nurses to align SCARWID outputs with their medical knowledge. This is particularly valuable when wound notes are unavailable or when assisting novice nurses who may find it difficult to identify visual attributes of wound infection.
Related papers
- Seeing the Trees for the Forest: Rethinking Weakly-Supervised Medical Visual Grounding [50.483761005446]
Current models struggle to associate textual descriptions with disease regions due to inefficient attention mechanisms and a lack of fine-grained token representations.<n>We introduce Disease-Aware Prompting (DAP), which uses the explainability map of a VLM to identify the appropriate image features.<n>DAP improves visual grounding accuracy by 20.74% compared to state-of-the-art methods across three major chest X-ray datasets.
arXiv Detail & Related papers (2025-05-21T05:16:45Z) - Multi-modal wound classification using wound image and location by Xception and Gaussian Mixture Recurrent Neural Network (GMRNN) [0.0]
We propose a multi-modal AI model based on transfer learning (TL), which combines two state-of-the-art architectures, Xception and GMRNN, for wound classification.<n>The proposed method is comprehensively compared with deep neural networks (DNN) for medical image analysis.
arXiv Detail & Related papers (2025-05-12T21:44:03Z) - Causal Disentanglement for Robust Long-tail Medical Image Generation [80.15257897500578]
We propose a novel medical image generation framework, which generates independent pathological and structural features.
We leverage a diffusion model guided by pathological findings to model pathological features, enabling the generation of diverse counterfactual images.
arXiv Detail & Related papers (2025-04-20T01:54:18Z) - Prompt as Knowledge Bank: Boost Vision-language model via Structural Representation for zero-shot medical detection [32.99689130650503]
We propose StructuralGLIP, which encodes prompts into a latent knowledge bank layer-by-layer.<n>In each layer, we select highly similar features from both the image representation and the knowledge bank, forming structural representations that capture nuanced relationships between image patches and target descriptions.<n>Experiments demonstrate that StructuralGLIP achieves a +4.1% AP improvement over prior state-of-the-art methods across seven zero-shot medical detection benchmarks.
arXiv Detail & Related papers (2025-02-22T13:22:25Z) - MedFILIP: Medical Fine-grained Language-Image Pre-training [11.894318326422054]
Existing methods struggle to accurately characterize associations between images and diseases.<n>MedFILIP introduces medical image-specific knowledge through contrastive learning.<n>For single-label, multi-label, and fine-grained classification, our model achieves state-of-the-art performance.
arXiv Detail & Related papers (2025-01-18T14:08:33Z) - FairSkin: Fair Diffusion for Skin Disease Image Generation [54.29840149709033]
Diffusion Model (DM) has become a leading method in generating synthetic medical images, but it suffers from a critical twofold bias.
We propose FairSkin, a novel DM framework that mitigates these biases through a three-level resampling mechanism.
Our approach significantly improves the diversity and quality of generated images, contributing to more equitable skin disease detection in clinical settings.
arXiv Detail & Related papers (2024-10-29T21:37:03Z) - StealthDiffusion: Towards Evading Diffusion Forensic Detection through Diffusion Model [62.25424831998405]
StealthDiffusion is a framework that modifies AI-generated images into high-quality, imperceptible adversarial examples.
It is effective in both white-box and black-box settings, transforming AI-generated images into high-quality adversarial forgeries.
arXiv Detail & Related papers (2024-08-11T01:22:29Z) - Contrastive Learning with Counterfactual Explanations for Radiology Report Generation [83.30609465252441]
We propose a textbfCountertextbfFactual textbfExplanations-based framework (CoFE) for radiology report generation.
Counterfactual explanations serve as a potent tool for understanding how decisions made by algorithms can be changed by asking what if'' scenarios.
Experiments on two benchmarks demonstrate that leveraging the counterfactual explanations enables CoFE to generate semantically coherent and factually complete reports.
arXiv Detail & Related papers (2024-07-19T17:24:25Z) - Guided Conditional Diffusion Classifier (ConDiff) for Enhanced Prediction of Infection in Diabetic Foot Ulcers [2.4548085068515286]
ConDiff is a novel deep-learning infection detection model that combines guided image synthesis with a Conditional denoising diffusion model and distance-based classification.
ConDiff demonstrated superior performance with an accuracy of 83% and an F1-score of 0.858, outperforming state-of-the-art models by at least 3%.
arXiv Detail & Related papers (2024-05-01T20:47:06Z) - Synthesizing Diabetic Foot Ulcer Images with Diffusion Model [1.8699569122464073]
Diabetic Foot Ulcer (DFU) is a serious skin wound requiring specialized care.
In recent years, generative adversarial networks and diffusion models have emerged as powerful tools for generating synthetic images.
This paper explores the potential of diffusion models for synthesizing DFU images and evaluates their authenticity through expert clinician assessments.
arXiv Detail & Related papers (2023-10-31T03:15:30Z) - Semi-supervised GAN for Bladder Tissue Classification in Multi-Domain
Endoscopic Images [10.48945682277992]
We propose a semi-surprised Generative Adrial Network (GAN)-based method composed of three main components.
The overall average classification accuracy, precision, and recall obtained with the proposed method for tissue classification are 0.90, 0.88, and 0.89 respectively.
arXiv Detail & Related papers (2022-12-21T21:32:36Z) - Harmonizing Pathological and Normal Pixels for Pseudo-healthy Synthesis [68.5287824124996]
We present a new type of discriminator, the segmentor, to accurately locate the lesions and improve the visual quality of pseudo-healthy images.
We apply the generated images into medical image enhancement and utilize the enhanced results to cope with the low contrast problem.
Comprehensive experiments on the T2 modality of BraTS demonstrate that the proposed method substantially outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2022-03-29T08:41:17Z) - An Interpretable Multiple-Instance Approach for the Detection of
referable Diabetic Retinopathy from Fundus Images [72.94446225783697]
We propose a machine learning system for the detection of referable Diabetic Retinopathy in fundus images.
By extracting local information from image patches and combining it efficiently through an attention mechanism, our system is able to achieve high classification accuracy.
We evaluate our approach on publicly available retinal image datasets, in which it exhibits near state-of-the-art performance.
arXiv Detail & Related papers (2021-03-02T13:14:15Z) - ElixirNet: Relation-aware Network Architecture Adaptation for Medical
Lesion Detection [90.13718478362337]
We introduce a novel ElixirNet that includes three components: 1) TruncatedRPN balances positive and negative data for false positive reduction; 2) Auto-lesion Block is automatically customized for medical images to incorporate relation-aware operations among region proposals; and 3) Relation transfer module incorporates the semantic relationship.
Experiments on DeepLesion and Kits19 prove the effectiveness of ElixirNet, achieving improvement of both sensitivity and precision over FPN with fewer parameters.
arXiv Detail & Related papers (2020-03-03T05:29:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.