Outlier Detection using Self-Organizing Maps for Automated Blood Cell
Analysis
- URL: http://arxiv.org/abs/2208.08834v1
- Date: Thu, 18 Aug 2022 13:53:27 GMT
- Title: Outlier Detection using Self-Organizing Maps for Automated Blood Cell
Analysis
- Authors: Stefan R\"ohrl, Alice Hein, Lucie Huang, Dominik Heim, Christian
Klenk, Manuel Lengl, Martin Knopp, Nawal Hafez, Oliver Hayden, Klaus Diepold
- Abstract summary: In this work, we assess the suitability of Self-Organizing Maps for outlier detection on a medical dataset.
We detect and evaluate outliers based on quantization errors and distance maps.
Self-Organizing Maps perform on par with a manually specified filter based on expert domain knowledge.
- Score: 1.2189422792863451
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The quality of datasets plays a crucial role in the successful training and
deployment of deep learning models. Especially in the medical field, where
system performance may impact the health of patients, clean datasets are a
safety requirement for reliable predictions. Therefore, outlier detection is an
essential process when building autonomous clinical decision systems. In this
work, we assess the suitability of Self-Organizing Maps for outlier detection
specifically on a medical dataset containing quantitative phase images of white
blood cells. We detect and evaluate outliers based on quantization errors and
distance maps. Our findings confirm the suitability of Self-Organizing Maps for
unsupervised Out-Of-Distribution detection on the dataset at hand.
Self-Organizing Maps perform on par with a manually specified filter based on
expert domain knowledge. Additionally, they show promise as a tool in the
exploration and cleaning of medical datasets. As a direction for future
research, we suggest a combination of Self-Organizing Maps and feature
extraction based on deep learning.
Related papers
- Detecting Dataset Bias in Medical AI: A Generalized and Modality-Agnostic Auditing Framework [8.520644988801243]
latent bias in machine learning datasets can be amplified during training and/or hidden during testing.
We present a data modality-agnostic auditing framework for generating targeted hypotheses about sources of bias.
We demonstrate the broad applicability and value of our method by analyzing large-scale medical datasets.
arXiv Detail & Related papers (2025-03-13T02:16:48Z) - Cross-platform Prediction of Depression Treatment Outcome Using Location Sensory Data on Smartphones [55.992010576087424]
We explore using location sensory data collected passively on smartphones to predict treatment outcome.
Our results show that using location features and baseline self-reported questionnaire score can lead to F1 score up to 0.67, comparable to that obtained using periodic self-reported questionnaires.
arXiv Detail & Related papers (2025-03-10T22:00:07Z) - A Comprehensive Dataset and Automated Pipeline for Nailfold Capillary Analysis [24.8934927577986]
We present a pioneering effort in constructing a comprehensive nailfold capillary dataset-321 images, 219 videos from 68 subjects, with clinic reports and expert annotations.
We finetuned three deep learning models with expert annotations as supervised labels and integrated them into a novel end-to-end nailfold capillary analysis pipeline.
Experiment results showed that our automated pipeline achieves an average of sub-pixel level precision in measurements and 89.9% accuracy in identifying morphological abnormalities.
arXiv Detail & Related papers (2023-12-10T16:33:41Z) - Towards Unifying Anatomy Segmentation: Automated Generation of a
Full-body CT Dataset via Knowledge Aggregation and Anatomical Guidelines [113.08940153125616]
We generate a dataset of whole-body CT scans with $142$ voxel-level labels for 533 volumes providing comprehensive anatomical coverage.
Our proposed procedure does not rely on manual annotation during the label aggregation stage.
We release our trained unified anatomical segmentation model capable of predicting $142$ anatomical structures on CT data.
arXiv Detail & Related papers (2023-07-25T09:48:13Z) - Clinically Acceptable Segmentation of Organs at Risk in Cervical Cancer
Radiation Treatment from Clinically Available Annotations [0.0]
We present an approach to learn a deep learning model for the automatic segmentation of Organs at Risk (OARs) in cervical cancer radiation treatment.
We employ simples for automatic data cleaning to minimize data inhomogeneity, label noise, and missing annotations.
We develop a semi-supervised learning approach utilizing a teacher-student setup, annotation imputation, and uncertainty-guided training to learn in presence of missing annotations.
arXiv Detail & Related papers (2023-02-21T13:24:40Z) - Unsupervised Cross-Domain Feature Extraction for Single Blood Cell Image
Classification [37.90158669637884]
Autoencoder is based on an R-CNN architecture allowing it to focus on the relevant white blood cell and eliminate artifacts in the image.
We show that thanks to the rich features extracted by the autoencoder trained on only one of the datasets, the random forest classifier performs satisfactorily on the unseen datasets.
Our results suggest the possibility of employing this unsupervised approach in more complicated diagnosis and prognosis tasks without the need to add expensive expert labels to unseen data.
arXiv Detail & Related papers (2022-07-01T15:44:42Z) - Self-Supervised Learning as a Means To Reduce the Need for Labeled Data
in Medical Image Analysis [64.4093648042484]
We use a dataset of chest X-ray images with bounding box labels for 13 different classes of anomalies.
We show that it is possible to achieve similar performance to a fully supervised model in terms of mean average precision and accuracy with only 60% of the labeled data.
arXiv Detail & Related papers (2022-06-01T09:20:30Z) - Chest x-ray automated triage: a semiologic approach designed for
clinical implementation, exploiting different types of labels through a
combination of four Deep Learning architectures [83.48996461770017]
This work presents a Deep Learning method based on the late fusion of different convolutional architectures.
We built four training datasets combining images from public chest x-ray datasets and our institutional archive.
We trained four different Deep Learning architectures and combined their outputs with a late fusion strategy, obtaining a unified tool.
arXiv Detail & Related papers (2020-12-23T14:38:35Z) - Semi-Automatic Data Annotation guided by Feature Space Projection [117.9296191012968]
We present a semi-automatic data annotation approach based on suitable feature space projection and semi-supervised label estimation.
We validate our method on the popular MNIST dataset and on images of human intestinal parasites with and without fecal impurities.
Our results demonstrate the added-value of visual analytics tools that combine complementary abilities of humans and machines for more effective machine learning.
arXiv Detail & Related papers (2020-07-27T17:03:50Z) - Trajectories, bifurcations and pseudotime in large clinical datasets:
applications to myocardial infarction and diabetes data [94.37521840642141]
We suggest a semi-supervised methodology for the analysis of large clinical datasets, characterized by mixed data types and missing values.
The methodology is based on application of elastic principal graphs which can address simultaneously the tasks of dimensionality reduction, data visualization, clustering, feature selection and quantifying the geodesic distances (pseudotime) in partially ordered sequences of observations.
arXiv Detail & Related papers (2020-07-07T21:04:55Z) - An Extensive Study on Cross-Dataset Bias and Evaluation Metrics
Interpretation for Machine Learning applied to Gastrointestinal Tract
Abnormality Classification [2.985964157078619]
Automatic analysis of diseases in the GI tract is a hot topic in computer science and medical-related journals.
A clear understanding of evaluation metrics and machine learning models with cross datasets is crucial to bring research in the field to a new quality level.
We present comprehensive evaluations of five distinct machine learning models that can classify 16 different GI tract conditions.
arXiv Detail & Related papers (2020-05-08T08:59:31Z) - Self-Training with Improved Regularization for Sample-Efficient Chest
X-Ray Classification [80.00316465793702]
We present a deep learning framework that enables robust modeling in challenging scenarios.
Our results show that using 85% lesser labeled data, we can build predictive models that match the performance of classifiers trained in a large-scale data setting.
arXiv Detail & Related papers (2020-05-03T02:36:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.