An Extensive Study on Cross-Dataset Bias and Evaluation Metrics
Interpretation for Machine Learning applied to Gastrointestinal Tract
Abnormality Classification
- URL: http://arxiv.org/abs/2005.03912v1
- Date: Fri, 8 May 2020 08:59:31 GMT
- Title: An Extensive Study on Cross-Dataset Bias and Evaluation Metrics
Interpretation for Machine Learning applied to Gastrointestinal Tract
Abnormality Classification
- Authors: Vajira Thambawita, Debesh Jha, Hugo Lewi Hammer, H{\aa}vard D.
Johansen, Dag Johansen, P{\aa}l Halvorsen, Michael A. Riegler
- Abstract summary: Automatic analysis of diseases in the GI tract is a hot topic in computer science and medical-related journals.
A clear understanding of evaluation metrics and machine learning models with cross datasets is crucial to bring research in the field to a new quality level.
We present comprehensive evaluations of five distinct machine learning models that can classify 16 different GI tract conditions.
- Score: 2.985964157078619
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Precise and efficient automated identification of Gastrointestinal (GI) tract
diseases can help doctors treat more patients and improve the rate of disease
detection and identification. Currently, automatic analysis of diseases in the
GI tract is a hot topic in both computer science and medical-related journals.
Nevertheless, the evaluation of such an automatic analysis is often incomplete
or simply wrong. Algorithms are often only tested on small and biased datasets,
and cross-dataset evaluations are rarely performed. A clear understanding of
evaluation metrics and machine learning models with cross datasets is crucial
to bring research in the field to a new quality level. Towards this goal, we
present comprehensive evaluations of five distinct machine learning models
using Global Features and Deep Neural Networks that can classify 16 different
key types of GI tract conditions, including pathological findings, anatomical
landmarks, polyp removal conditions, and normal findings from images captured
by common GI tract examination instruments. In our evaluation, we introduce
performance hexagons using six performance metrics such as recall, precision,
specificity, accuracy, F1-score, and Matthews Correlation Coefficient to
demonstrate how to determine the real capabilities of models rather than
evaluating them shallowly. Furthermore, we perform cross-dataset evaluations
using different datasets for training and testing. With these cross-dataset
evaluations, we demonstrate the challenge of actually building a generalizable
model that could be used across different hospitals. Our experiments clearly
show that more sophisticated performance metrics and evaluation methods need to
be applied to get reliable models rather than depending on evaluations of the
splits of the same dataset, i.e., the performance metrics should always be
interpreted together rather than relying on a single metric.
Related papers
- Weakly supervised deep learning model with size constraint for prostate cancer detection in multiparametric MRI and generalization to unseen domains [0.90668179713299]
We show that the model achieves on-par performance with strong fully supervised baseline models.
We also observe a performance decrease for both fully supervised and weakly supervised models when tested on unseen data domains.
arXiv Detail & Related papers (2024-11-04T12:24:33Z) - Quality assurance of organs-at-risk delineation in radiotherapy [7.698565355235687]
The delineation of tumor target and organs-at-risk is critical in the radiotherapy treatment planning.
The quality assurance of the automatic segmentation is still an unmet need in clinical practice.
Our proposed model, which introduces residual network and attention mechanism in the one-class classification framework, was able to detect the various types of OAR contour errors with high accuracy.
arXiv Detail & Related papers (2024-05-20T02:32:46Z) - Diagnosing Human-object Interaction Detectors [42.283857276076596]
We introduce a diagnosis toolbox to provide detailed quantitative break-down analysis of HOI detection models.
We analyze eight state-of-the-art HOI detection models and provide valuable diagnosis insights to foster future research.
arXiv Detail & Related papers (2023-08-16T17:39:15Z) - Towards Unifying Anatomy Segmentation: Automated Generation of a
Full-body CT Dataset via Knowledge Aggregation and Anatomical Guidelines [113.08940153125616]
We generate a dataset of whole-body CT scans with $142$ voxel-level labels for 533 volumes providing comprehensive anatomical coverage.
Our proposed procedure does not rely on manual annotation during the label aggregation stage.
We release our trained unified anatomical segmentation model capable of predicting $142$ anatomical structures on CT data.
arXiv Detail & Related papers (2023-07-25T09:48:13Z) - MedFMC: A Real-world Dataset and Benchmark For Foundation Model
Adaptation in Medical Image Classification [41.16626194300303]
Foundation models, often pre-trained with large-scale data, have achieved paramount success in jump-starting various vision and language applications.
Recent advances further enable adapting foundation models in downstream tasks efficiently using only a few training samples.
Yet, the application of such learning paradigms in medical image analysis remains scarce due to the shortage of publicly accessible data and benchmarks.
arXiv Detail & Related papers (2023-06-16T01:46:07Z) - A Real Use Case of Semi-Supervised Learning for Mammogram Classification
in a Local Clinic of Costa Rica [0.5541644538483946]
Training a deep learning model requires a considerable amount of labeled images.
A number of publicly available datasets have been built with data from different hospitals and clinics.
The use of the semi-supervised deep learning approach known as MixMatch, to leverage the usage of unlabeled data is proposed and evaluated.
arXiv Detail & Related papers (2021-07-24T22:26:50Z) - Deep learning-based COVID-19 pneumonia classification using chest CT
images: model generalizability [54.86482395312936]
Deep learning (DL) classification models were trained to identify COVID-19-positive patients on 3D computed tomography (CT) datasets from different countries.
We trained nine identical DL-based classification models by using combinations of the datasets with a 72% train, 8% validation, and 20% test data split.
The models trained on multiple datasets and evaluated on a test set from one of the datasets used for training performed better.
arXiv Detail & Related papers (2021-02-18T21:14:52Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z) - Trajectories, bifurcations and pseudotime in large clinical datasets:
applications to myocardial infarction and diabetes data [94.37521840642141]
We suggest a semi-supervised methodology for the analysis of large clinical datasets, characterized by mixed data types and missing values.
The methodology is based on application of elastic principal graphs which can address simultaneously the tasks of dimensionality reduction, data visualization, clustering, feature selection and quantifying the geodesic distances (pseudotime) in partially ordered sequences of observations.
arXiv Detail & Related papers (2020-07-07T21:04:55Z) - Semi-supervised Medical Image Classification with Relation-driven
Self-ensembling Model [71.80319052891817]
We present a relation-driven semi-supervised framework for medical image classification.
It exploits the unlabeled data by encouraging the prediction consistency of given input under perturbations.
Our method outperforms many state-of-the-art semi-supervised learning methods on both single-label and multi-label image classification scenarios.
arXiv Detail & Related papers (2020-05-15T06:57:54Z) - Self-Training with Improved Regularization for Sample-Efficient Chest
X-Ray Classification [80.00316465793702]
We present a deep learning framework that enables robust modeling in challenging scenarios.
Our results show that using 85% lesser labeled data, we can build predictive models that match the performance of classifiers trained in a large-scale data setting.
arXiv Detail & Related papers (2020-05-03T02:36:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.