Publicly available datasets of breast histopathology H&E whole-slide
images: A scoping review
- URL: http://arxiv.org/abs/2306.01546v2
- Date: Wed, 6 Dec 2023 09:43:41 GMT
- Title: Publicly available datasets of breast histopathology H&E whole-slide
images: A scoping review
- Authors: Masoud Tafavvoghi (1), Lars Ailo Bongo (2), Nikita Shvetsov (2),
Lill-Tove Rasmussen Busund (3), Kajsa M{\o}llersen (1) ((1) Department of
Community Medicine, UiT The Arctic University of Norway, Troms{\o}, Norway,
(2) Department of Computer Science, UiT The Arctic University of Norway,
Troms{\o}, Norway, (3) Department of Medical Biology, UiT The Arctic
University of Norway, Troms{\o}, Norway)
- Abstract summary: We identified the publicly available datasets of breast H&E stained whole-slide images (WSI) that can be used to develop deep learning algorithms.
This dataset has a considerable selection bias that can impact the robustness and generalizability of the trained algorithms.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Advancements in digital pathology and computing resources have made a
significant impact in the field of computational pathology for breast cancer
diagnosis and treatment. However, access to high-quality labeled
histopathological images of breast cancer is a big challenge that limits the
development of accurate and robust deep learning models. In this scoping
review, we identified the publicly available datasets of breast H&E stained
whole-slide images (WSI) that can be used to develop deep learning algorithms.
We systematically searched nine scientific literature databases and nine
research data repositories and found 17 publicly available datasets containing
10385 H&E WSIs of breast cancer. Moreover, we reported image metadata and
characteristics for each dataset to assist researchers in selecting proper
datasets for specific tasks in breast cancer computational pathology. In
addition, we compiled two lists of breast H&E patches and private datasets as
supplementary resources for researchers. Notably, only 28% of the included
articles utilized multiple datasets, and only 14% used an external validation
set, suggesting that the performance of other developed models may be
susceptible to overestimation. The TCGA-BRCA was used in 52% of the selected
studies. This dataset has a considerable selection bias that can impact the
robustness and generalizability of the trained algorithms. There is also a lack
of consistent metadata reporting of breast WSI datasets that can be an issue in
developing accurate deep learning models, indicating the necessity of
establishing explicit guidelines for documenting breast WSI dataset
characteristics and metadata.
Related papers
- Breast Histopathology Image Retrieval by Attention-based Adversarially Regularized Variational Graph Autoencoder with Contrastive Learning-Based Feature Extraction [1.48419209885019]
This work introduces a novel attention-based adversarially regularized variational graph autoencoder model for breast histological image retrieval.
We evaluated the performance of the proposed model on two publicly available datasets of breast cancer histological images.
arXiv Detail & Related papers (2024-05-07T11:24:37Z) - ACROBAT -- a multi-stain breast cancer histological whole-slide-image
data set from routine diagnostics for computational pathology [1.6619031082709266]
The analysis of FFPE tissue sections stained with haematoxylin and eosin (H&E) orchemistry (IHC) is an essential part of the pathologic assessment of surgically resected breast cancer specimens.
This data set has the potential to enable many different avenues of computational pathology research.
arXiv Detail & Related papers (2022-11-24T14:16:36Z) - BRACS: A Dataset for BReAst Carcinoma Subtyping in H&E Histology Images [4.974822167947921]
We introduce the BReAst Carcinoma Subtyping dataset, a large cohort of annotated Hematoxylin & Eosin (H&E)-stained images to facilitate the characterization of breast lesions.
BRACS contains 547 Whole-Slide Images (WSIs), and 4539 Regions of Interest (ROIs) extracted from the WSIs.
arXiv Detail & Related papers (2021-11-08T15:04:16Z) - The pitfalls of using open data to develop deep learning solutions for
COVID-19 detection in chest X-rays [64.02097860085202]
Deep learning models have been developed to identify COVID-19 from chest X-rays.
Results have been exceptional when training and testing on open-source data.
Data analysis and model evaluations show that the popular open-source dataset COVIDx is not representative of the real clinical problem.
arXiv Detail & Related papers (2021-09-14T10:59:11Z) - BCNet: A Deep Convolutional Neural Network for Breast Cancer Grading [0.0]
Deep learning has been recently adopted widely in different areas of science, especially medicine.
In breast cancer detection problems, some diverse deep learning techniques have been developed on different datasets and resulted in good accuracy.
arXiv Detail & Related papers (2021-07-11T12:55:33Z) - G-MIND: An End-to-End Multimodal Imaging-Genetics Framework for
Biomarker Identification and Disease Classification [49.53651166356737]
We propose a novel deep neural network architecture to integrate imaging and genetics data, as guided by diagnosis, that provides interpretable biomarkers.
We have evaluated our model on a population study of schizophrenia that includes two functional MRI (fMRI) paradigms and Single Nucleotide Polymorphism (SNP) data.
arXiv Detail & Related papers (2021-01-27T19:28:04Z) - Topological Data Analysis of copy number alterations in cancer [70.85487611525896]
We explore the potential to capture information contained in cancer genomic information using a novel topology-based approach.
We find that this technique has the potential to extract meaningful low-dimensional representations in cancer somatic genetic data.
arXiv Detail & Related papers (2020-11-22T17:31:23Z) - Detection of masses and architectural distortions in digital breast
tomosynthesis: a publicly available dataset of 5,060 patients and a deep
learning model [4.3359550072619255]
We have curated and made publicly available a large-scale dataset of digital breast tomosynthesis images.
It contains 22,032 reconstructed volumes belonging to 5,610 studies from 5,060 patients.
We developed a single-phase deep learning detection model and tested it using our dataset to serve as a baseline for future research.
arXiv Detail & Related papers (2020-11-13T18:33:31Z) - Creation and Validation of a Chest X-Ray Dataset with Eye-tracking and
Report Dictation for AI Development [47.1152650685625]
We developed a rich dataset of Chest X-Ray (CXR) images to assist investigators in artificial intelligence.
The data were collected using an eye tracking system while a radiologist reviewed and reported on 1,083 CXR images.
arXiv Detail & Related papers (2020-09-15T23:12:49Z) - Deep Mining External Imperfect Data for Chest X-ray Disease Screening [57.40329813850719]
We argue that incorporating an external CXR dataset leads to imperfect training data, which raises the challenges.
We formulate the multi-label disease classification problem as weighted independent binary tasks according to the categories.
Our framework simultaneously models and tackles the domain and label discrepancies, enabling superior knowledge mining ability.
arXiv Detail & Related papers (2020-06-06T06:48:40Z) - Opportunities and Challenges of Deep Learning Methods for
Electrocardiogram Data: A Systematic Review [62.490310870300746]
The electrocardiogram (ECG) is one of the most commonly used diagnostic tools in medicine and healthcare.
Deep learning methods have achieved promising results on predictive healthcare tasks using ECG signals.
This paper presents a systematic review of deep learning methods for ECG data from both modeling and application perspectives.
arXiv Detail & Related papers (2019-12-28T02:44:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.