Analysis of Blood Report Images Using General Purpose Vision-Language Models
- URL: http://arxiv.org/abs/2509.06033v1
- Date: Sun, 07 Sep 2025 12:31:16 GMT
- Title: Analysis of Blood Report Images Using General Purpose Vision-Language Models
- Authors: Nadia Bakhsheshi, Hamid Beigy,
- Abstract summary: General-purpose Vision-Language Models (VLMs) can provide clear interpretations directly from images.<n>This work establishes a foundation for the future development of reliable and accessible AI-assisted healthcare applications.
- Score: 11.68803062751453
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The reliable analysis of blood reports is important for health knowledge, but individuals often struggle with interpretation, leading to anxiety and overlooked issues. We explore the potential of general-purpose Vision-Language Models (VLMs) to address this challenge by automatically analyzing blood report images. We conduct a comparative evaluation of three VLMs: Qwen-VL-Max, Gemini 2.5 Pro, and Llama 4 Maverick, determining their performance on a dataset of 100 diverse blood report images. Each model was prompted with clinically relevant questions adapted to each blood report. The answers were then processed using Sentence-BERT to compare and evaluate how closely the models responded. The findings suggest that general-purpose VLMs are a practical and promising technology for developing patient-facing tools for preliminary blood report analysis. Their ability to provide clear interpretations directly from images can improve health literacy and reduce the limitations to understanding complex medical information. This work establishes a foundation for the future development of reliable and accessible AI-assisted healthcare applications. While results are encouraging, they should be interpreted cautiously given the limited dataset size.
Related papers
- Visual Alignment of Medical Vision-Language Models for Grounded Radiology Report Generation [25.148217482604746]
We propose VALOR:Visual Alignment of Medical Vision-Language Models for Radiology Report Generation.<n>Our method introduces a reinforcement learning-based post-alignment framework utilizing Group-Relative Proximal Optimization (GRPO)<n>Experiments on multiple benchmarks demonstrate that VALOR substantially improves factual accuracy and visual grounding, achieving significant performance gains over state-of-the-art report generation methods.
arXiv Detail & Related papers (2025-12-18T05:48:21Z) - Synthetic Vasculature and Pathology Enhance Vision-Language Model Reasoning [39.96133625333846]
We introduce Synthetic Vasculature Reasoning (SVR), a framework that controllably synthesizes images and corresponding text.<n>Based on this we curate OCTA-100K-SVR, an OCTA image-reasoning dataset with 100,000 pairs.<n>Our experiments show that a general-purpose VLM trained on the dataset achieves a zero-shot balanced classification accuracy of 89.67% on real OCTA images.
arXiv Detail & Related papers (2025-12-11T19:19:39Z) - TemMed-Bench: Evaluating Temporal Medical Image Reasoning in Vision-Language Models [54.48710348910535]
Existing medical reasoning benchmarks primarily focus on analyzing a patient's condition based on an image from a single visit.<n>We introduce TemMed-Bench, the first benchmark designed for analyzing changes in patients' conditions between different clinical visits.
arXiv Detail & Related papers (2025-09-29T17:51:26Z) - Conversation AI Dialog for Medicare powered by Finetuning and Retrieval Augmented Generation [0.0]
Large language models (LLMs) have shown impressive capabilities in natural language processing tasks, including dialogue generation.<n>This research aims to conduct a novel comparative analysis of two prominent techniques, fine-tuning with LoRA and the Retrieval-Augmented Generation framework.
arXiv Detail & Related papers (2025-02-04T11:50:40Z) - A Survey of Medical Vision-and-Language Applications and Their Techniques [48.268198631277315]
Medical vision-and-language models (MVLMs) have attracted substantial interest due to their capability to offer a natural language interface for interpreting complex medical data.
Here, we provide a comprehensive overview of MVLMs and the various medical tasks to which they have been applied.
We also examine the datasets used for these tasks and compare the performance of different models based on standardized evaluation metrics.
arXiv Detail & Related papers (2024-11-19T03:27:05Z) - CXR-Agent: Vision-language models for chest X-ray interpretation with uncertainty aware radiology reporting [0.0]
We evaluate the publicly available, state of the art, foundational vision-language models for chest X-ray interpretation.
We find that vision-language models often hallucinate with confident language, which slows down clinical interpretation.
We develop an agent-based vision-language approach for report generation using CheXagent's linear probes and BioViL-T's phrase grounding tools.
arXiv Detail & Related papers (2024-07-11T18:39:19Z) - Uncertainty-aware Medical Diagnostic Phrase Identification and Grounding [72.18719355481052]
We introduce a novel task called Medical Report Grounding (MRG)<n>MRG aims to directly identify diagnostic phrases and their corresponding grounding boxes from medical reports in an end-to-end manner.<n>We propose uMedGround, a robust and reliable framework that leverages a multimodal large language model to predict diagnostic phrases.
arXiv Detail & Related papers (2024-04-10T07:41:35Z) - Application Of Vision-Language Models For Assessing Osteoarthritis
Disease Severity [0.43431539537721414]
Osteoarthritis (OA) poses a global health challenge, demanding precise diagnostic methods.
Existing deep learning models for OA assessment are unimodal single task systems.
This study investigates employing Vision Language Processing models to predict OA severity using Xray images and corresponding reports.
arXiv Detail & Related papers (2024-01-12T02:43:58Z) - Robust and Interpretable Medical Image Classifiers via Concept
Bottleneck Models [49.95603725998561]
We propose a new paradigm to build robust and interpretable medical image classifiers with natural language concepts.
Specifically, we first query clinical concepts from GPT-4, then transform latent image features into explicit concepts with a vision-language model.
arXiv Detail & Related papers (2023-10-04T21:57:09Z) - Medical Misinformation in AI-Assisted Self-Diagnosis: Development of a Method (EvalPrompt) for Analyzing Large Language Models [4.8775268199830935]
This study aims to assess the effectiveness of large language models (LLMs) as a self-diagnostic tool and their role in spreading healthcare misinformation.<n>We use open-ended questions to mimic real-world self-diagnosis use cases, and perform sentence dropout to mimic realistic self-diagnosis with missing information.<n>The results highlight the modest capabilities of LLMs, as their responses are often unclear and inaccurate.
arXiv Detail & Related papers (2023-07-10T21:28:26Z) - Informing clinical assessment by contextualizing post-hoc explanations
of risk prediction models in type-2 diabetes [50.8044927215346]
We consider a comorbidity risk prediction scenario and focus on contexts regarding the patients clinical state.
We employ several state-of-the-art LLMs to present contexts around risk prediction model inferences and evaluate their acceptability.
Our paper is one of the first end-to-end analyses identifying the feasibility and benefits of contextual explanations in a real-world clinical use case.
arXiv Detail & Related papers (2023-02-11T18:07:11Z) - MIMO: Mutual Integration of Patient Journey and Medical Ontology for
Healthcare Representation Learning [49.57261599776167]
We propose an end-to-end robust Transformer-based solution, Mutual Integration of patient journey and Medical Ontology (MIMO) for healthcare representation learning and predictive analytics.
arXiv Detail & Related papers (2021-07-20T07:04:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.