Related papers: Fine-tuned Vision Language Model for Localization of Parasitic Eggs in Microscopic Images

Fine-tuned Vision Language Model for Localization of Parasitic Eggs in Microscopic Images

URL: http://arxiv.org/abs/2602.13712v1
Date: Sat, 14 Feb 2026 10:25:13 GMT
Title: Fine-tuned Vision Language Model for Localization of Parasitic Eggs in Microscopic Images
Authors: Chan Hao Sien, Hezerul Abdul Karim, Nouar AlDahoul,
Abstract summary: This paper aims to utilize a vision language model (VLM) such as Microsoft Florence that was fine-tuned to localize all parasitic eggs within microscopic images.<n>The preliminary results show that our localization VLM performs comparatively better than the other object detection methods, such as EfficientDet, with an mIOU of 0.94.
Score: 0.8921166277011344
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Soil-transmitted helminth (STH) infections continuously affect a large proportion of the global population, particularly in tropical and sub-tropical regions, where access to specialized diagnostic expertise is limited. Although manual microscopic diagnosis of parasitic eggs remains the diagnostic gold standard, the approach can be labour-intensive, time-consuming, and prone to human error. This paper aims to utilize a vision language model (VLM) such as Microsoft Florence that was fine-tuned to localize all parasitic eggs within microscopic images. The preliminary results show that our localization VLM performs comparatively better than the other object detection methods, such as EfficientDet, with an mIOU of 0.94. This finding demonstrates the potential of the proposed VLM to serve as a core component of an automated framework, offering a scalable engineering solution for intelligent parasitological diagnosis.

Related papers

Hepato-LLaVA: An Expert MLLM with Sparse Topo-Pack Attention for Hepatocellular Pathology Analysis on Whole Slide Images [32.940175542155835]
Current computational approaches are constrained by fixed-resolution processing mechanisms and inefficient feature aggregation.<n>Hepto-LLaVA is a specialized Multi-modal Large Language Model designed for fine-grained tissue pathology analysis.<n>We present HepatoPathoVQA, a clinically grounded dataset comprising 33K hierarchically structured question-answer pairs validated by expert pathologists.
arXiv Detail & Related papers (2026-02-23T01:43:32Z)
Unveiling and Bridging the Functional Perception Gap in MLLMs: Atomic Visual Alignment and Hierarchical Evaluation via PET-Bench [48.60251555171943]
Multimodal Large Language Models (MLLMs) have demonstrated remarkable proficiency in tasks such as abnormality detection and report generation for anatomical modalities.<n>In this work, we quantify a fundamental functional perception gap: the inability of current vision encoders to decode functional tracer biodistribution independent of morphological priors.<n>We introduce PET-Bench, the first large-scale functional imaging benchmark comprising 52,308 hierarchical QA pairs from 9,732 multi-site, multi-tracer PET studies.<n>Our results demonstrate that AVA effectively bridges the perception gap, transforming CoT from a source of hallucination into a robust inference tool and improving diagnostic
arXiv Detail & Related papers (2026-01-06T05:58:50Z)
LMOD+: A Comprehensive Multimodal Dataset and Benchmark for Developing and Evaluating Multimodal Large Language Models in Ophthalmology [43.092364533480456]
Vision-threatening eye diseases pose a major global health burden, with timely diagnosis limited by workforce shortages and restricted access to specialized care.<n>We present a large-scale multimodal ophthalmology benchmark comprising 32,633 instances with multi-granular annotations across 12 common ophthalmic conditions and 5 imaging modalities.<n>The dataset integrates imaging, anatomical structures, demographics, and free-text annotations, supporting anatomical structure recognition, disease screening, disease staging, and demographic prediction for bias evaluation.
arXiv Detail & Related papers (2025-09-30T00:29:18Z)
Constructing Ophthalmic MLLM for Positioning-diagnosis Collaboration Through Clinical Cognitive Chain Reasoning [0.5360375691077625]
FundusExpert is an ophthalmology-specific MLLM with integrated positioning-diagnosis reasoning capabilities.<n>FundusGen is a dataset constructed through the intelligent Fundus-Engine system.
arXiv Detail & Related papers (2025-07-23T14:19:30Z)
CLIP Meets Diffusion: A Synergistic Approach to Anomaly Detection [49.11819337853632]
Anomaly detection is a complex problem due to the ambiguity in defining anomalies, the diversity of anomaly types, and the scarcity of training data.<n>We propose CLIPfusion, a method that leverages both discriminative and generative foundation models.<n>We believe that our method underscores the effectiveness of multi-modal and multi-model fusion in tackling the multifaceted challenges of anomaly detection.
arXiv Detail & Related papers (2025-06-13T13:30:15Z)
Improving Medical Large Vision-Language Models with Abnormal-Aware Feedback [57.98393950821579]
We propose a novel UMed-LVLM designed to unveil medical abnormalities.<n>We propose a prompt method utilizing the GPT-4V to generate diagnoses based on identified abnormal areas in medical images.<n>Our UMed-LVLM significantly outperforms existing Med-LVLMs in identifying and understanding medical abnormalities.
arXiv Detail & Related papers (2025-01-02T17:37:20Z)
Super-resolution of biomedical volumes with 2D supervision [84.5255884646906]
Masked slice diffusion for super-resolution exploits the inherent equivalence in the data-generating distribution across all spatial dimensions of biological specimens. We focus on the application of SliceR to stimulated histology (SRH), characterized by its rapid acquisition of high-resolution 2D images but slow and costly optical z-sectioning.
arXiv Detail & Related papers (2024-04-15T02:41:55Z)
Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images [68.42215385041114]
This paper introduces a novel lightweight multi-level adaptation and comparison framework to repurpose the CLIP model for medical anomaly detection. Our approach integrates multiple residual adapters into the pre-trained visual encoder, enabling a stepwise enhancement of visual features across different levels. Our experiments on medical anomaly detection benchmarks demonstrate that our method significantly surpasses current state-of-the-art models.
arXiv Detail & Related papers (2024-03-19T09:28:19Z)
Localization and Classification of Parasitic Eggs in Microscopic Images Using an EfficientDet Detector [0.0]
We propose a multi-modal learning detector to localize parasitic eggs and categorize them into 11 categories. Our results show robust performance with an accuracy of 92%, and an F1 score of 93%.
arXiv Detail & Related papers (2022-08-03T10:28:18Z)
A Robust Ensemble Model for Patasitic Egg Detection and Classification [9.449507409551842]
Intestinal parasitic infections, as a leading causes of morbidity worldwide, still lacks time-saving, high-sensitivity and user-friendly examination method. In this paper, we apply several object detectors such as YOLOv5 and variant cascadeRCNNs to automatically discriminate parasitic eggs in microscope images.
arXiv Detail & Related papers (2022-07-04T13:53:46Z)
Parasitic Egg Detection and Classification in Low-cost Microscopic Images using Transfer Learning [1.6050172226234583]
We propose a CNN-based technique using transfer learning strategy to enhance the efficiency of automatic parasite classification in poor-quality microscopic images. Our proposed framework outperforms the state-of-the-art object recognition methods. Our system combined with final decision from an expert may improve the real faecal examination with low-cost microscopes.
arXiv Detail & Related papers (2021-07-02T11:05:45Z)
Inf-Net: Automatic COVID-19 Lung Infection Segmentation from CT Images [152.34988415258988]
Automated detection of lung infections from computed tomography (CT) images offers a great potential to augment the traditional healthcare strategy for tackling COVID-19. segmenting infected regions from CT slices faces several challenges, including high variation in infection characteristics, and low intensity contrast between infections and normal tissues. To address these challenges, a novel COVID-19 Deep Lung Infection Network (Inf-Net) is proposed to automatically identify infected regions from chest CT slices.
arXiv Detail & Related papers (2020-04-22T07:30:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.