A COCO-Formatted Instance-Level Dataset for Plasmodium Falciparum Detection in Giemsa-Stained Blood Smears
- URL: http://arxiv.org/abs/2507.18483v1
- Date: Thu, 24 Jul 2025 14:56:18 GMT
- Title: A COCO-Formatted Instance-Level Dataset for Plasmodium Falciparum Detection in Giemsa-Stained Blood Smears
- Authors: Frauke Wilm, Luis Carlos Rivera Monroy, Mathias Öttl, Lukas Mürdter, Leonid Mill, Andreas Maier,
- Abstract summary: We present an enhanced version of the publicly available NIH malaria dataset, with detailed bounding box annotations in COCO format to support object detection training.<n>We validated the revised annotations by training a Faster R-CNN model to detect infected and non-infected red blood cells, as well as white blood cells.
- Score: 2.8115690524924357
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Accurate detection of Plasmodium falciparum in Giemsa-stained blood smears is an essential component of reliable malaria diagnosis, especially in developing countries. Deep learning-based object detection methods have demonstrated strong potential for automated Malaria diagnosis, but their adoption is limited by the scarcity of datasets with detailed instance-level annotations. In this work, we present an enhanced version of the publicly available NIH malaria dataset, with detailed bounding box annotations in COCO format to support object detection training. We validated the revised annotations by training a Faster R-CNN model to detect infected and non-infected red blood cells, as well as white blood cells. Cross-validation on the original dataset yielded F1 scores of up to 0.88 for infected cell detection. These results underscore the importance of annotation volume and consistency, and demonstrate that automated annotation refinement combined with targeted manual correction can produce training data of sufficient quality for robust detection performance. The updated annotations set is publicly available via GitHub: https://github.com/MIRA-Vision-Microscopy/malaria-thin-smear-coco.
Related papers
- Enhanced Tuberculosis Bacilli Detection using Attention-Residual U-Net and Ensemble Classification [0.0]
Tuberculosis, caused by Mycobacterium tuberculosis, remains a critical global health issue, necessitating timely diagnosis and treatment.<n>Current methods for detecting tuberculosis bacilli from bright field microscopic sputum smear images suffer from low automation, inadequate segmentation performance, and limited classification accuracy.<n>This paper proposes an efficient hybrid approach that combines deep learning for segmentation and an ensemble model for classification.
arXiv Detail & Related papers (2025-01-07T05:21:13Z) - Unlearnable Examples Detection via Iterative Filtering [84.59070204221366]
Deep neural networks are proven to be vulnerable to data poisoning attacks.
It is quite beneficial and challenging to detect poisoned samples from a mixed dataset.
We propose an Iterative Filtering approach for UEs identification.
arXiv Detail & Related papers (2024-08-15T13:26:13Z) - ArSDM: Colonoscopy Images Synthesis with Adaptive Refinement Semantic
Diffusion Models [69.9178140563928]
Colonoscopy analysis is essential for assisting clinical diagnosis and treatment.
The scarcity of annotated data limits the effectiveness and generalization of existing methods.
We propose an Adaptive Refinement Semantic Diffusion Model (ArSDM) to generate colonoscopy images that benefit the downstream tasks.
arXiv Detail & Related papers (2023-09-03T07:55:46Z) - Class Attention to Regions of Lesion for Imbalanced Medical Image
Recognition [59.28732531600606]
We propose a framework named textbfClass textbfAttention to textbfREgions of the lesion (CARE) to handle data imbalance issues.
The CARE framework needs bounding boxes to represent the lesion regions of rare diseases.
Results show that the CARE variants with automated bounding box generation are comparable to the original CARE framework.
arXiv Detail & Related papers (2023-07-19T15:19:02Z) - Augment and Criticize: Exploring Informative Samples for Semi-Supervised
Monocular 3D Object Detection [64.65563422852568]
We improve the challenging monocular 3D object detection problem with a general semi-supervised framework.
We introduce a novel, simple, yet effective Augment and Criticize' framework that explores abundant informative samples from unlabeled data.
The two new detectors, dubbed 3DSeMo_DLE and 3DSeMo_FLEX, achieve state-of-the-art results with remarkable improvements for over 3.5% AP_3D/BEV (Easy) on KITTI.
arXiv Detail & Related papers (2023-03-20T16:28:15Z) - Anatomy-Guided Weakly-Supervised Abnormality Localization in Chest
X-rays [17.15666977702355]
We propose an Anatomy-Guided chest X-ray Network (AGXNet) to address weak annotation issues.
Our framework consists of a cascade of two networks, one responsible for identifying anatomical abnormalities and the second responsible for pathological observations.
Our results on the MIMIC-CXR dataset demonstrate the effectiveness of AGXNet in disease and anatomical abnormality localization.
arXiv Detail & Related papers (2022-06-25T18:33:27Z) - Label Cleaning Multiple Instance Learning: Refining Coarse Annotations
on Single Whole-Slide Images [83.7047542725469]
Annotating cancerous regions in whole-slide images (WSIs) of pathology samples plays a critical role in clinical diagnosis, biomedical research, and machine learning algorithms development.
We present a method, named Label Cleaning Multiple Instance Learning (LC-MIL), to refine coarse annotations on a single WSI without the need of external training data.
Our experiments on a heterogeneous WSI set with breast cancer lymph node metastasis, liver cancer, and colorectal cancer samples show that LC-MIL significantly refines the coarse annotations, outperforming the state-of-the-art alternatives, even while learning from a single slide.
arXiv Detail & Related papers (2021-09-22T15:06:06Z) - Out of Distribution Detection and Adversarial Attacks on Deep Neural
Networks for Robust Medical Image Analysis [8.985261743452988]
We experimentally evaluate the robustness of a Mahalanobis distance-based confidence score, a simple yet effective method for detecting abnormal input samples.
Results indicated that the Mahalanobis confidence score detector exhibits improved performance and robustness of deep learning models.
arXiv Detail & Related papers (2021-07-10T18:00:40Z) - Attack-agnostic Adversarial Detection on Medical Data Using Explainable
Machine Learning [0.0]
We propose a model agnostic explainability-based method for the accurate detection of adversarial samples on two datasets.
On the MIMIC-III and Henan-Renmin EHR datasets, we report a detection accuracy of 77% against the Longitudinal Adrial Attack.
On the MIMIC-CXR dataset, we achieve an accuracy of 88%; significantly improving on the state of the art of adversarial detection in both datasets by over 10% in all settings.
arXiv Detail & Related papers (2021-05-05T10:01:53Z) - Many-to-One Distribution Learning and K-Nearest Neighbor Smoothing for
Thoracic Disease Identification [83.6017225363714]
deep learning has become the most powerful computer-aided diagnosis technology for improving disease identification performance.
For chest X-ray imaging, annotating large-scale data requires professional domain knowledge and is time-consuming.
In this paper, we propose many-to-one distribution learning (MODL) and K-nearest neighbor smoothing (KNNS) methods to improve a single model's disease identification performance.
arXiv Detail & Related papers (2021-02-26T02:29:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.