Efficient Large Scale Medical Image Dataset Preparation for Machine
Learning Applications
- URL: http://arxiv.org/abs/2309.17285v1
- Date: Fri, 29 Sep 2023 14:41:02 GMT
- Title: Efficient Large Scale Medical Image Dataset Preparation for Machine
Learning Applications
- Authors: Stefan Denner, Jonas Scherer, Klaus Kades, Dimitrios Bounias, Philipp
Schader, Lisa Kausch, Markus Bujotzek, Andreas Michael Bucher, Tobias
Penzkofer, Klaus Maier-Hein
- Abstract summary: This paper introduces an innovative data curation tool, developed as part of the Kaapana open-source toolkit.
The tool is specifically tailored to meet the needs of radiologists and machine learning researchers.
It incorporates advanced search, auto-annotation and efficient tagging functionalities for improved data curation.
- Score: 0.08484806297945031
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the rapidly evolving field of medical imaging, machine learning algorithms
have become indispensable for enhancing diagnostic accuracy. However, the
effectiveness of these algorithms is contingent upon the availability and
organization of high-quality medical imaging datasets. Traditional Digital
Imaging and Communications in Medicine (DICOM) data management systems are
inadequate for handling the scale and complexity of data required to be
facilitated in machine learning algorithms. This paper introduces an innovative
data curation tool, developed as part of the Kaapana open-source toolkit, aimed
at streamlining the organization, management, and processing of large-scale
medical imaging datasets. The tool is specifically tailored to meet the needs
of radiologists and machine learning researchers. It incorporates advanced
search, auto-annotation and efficient tagging functionalities for improved data
curation. Additionally, the tool facilitates quality control and review,
enabling researchers to validate image and segmentation quality in large
datasets. It also plays a critical role in uncovering potential biases in
datasets by aggregating and visualizing metadata, which is essential for
developing robust machine learning models. Furthermore, Kaapana is integrated
within the Radiological Cooperative Network (RACOON), a pioneering initiative
aimed at creating a comprehensive national infrastructure for the aggregation,
transmission, and consolidation of radiological data across all university
clinics throughout Germany. A supplementary video showcasing the tool's
functionalities can be accessed at https://bit.ly/MICCAI-DEMI2023.
Related papers
- Coupling AI and Citizen Science in Creation of Enhanced Training Dataset for Medical Image Segmentation [3.7274206780843477]
We introduce a robust and versatile framework that combines AI and crowdsourcing to improve the quality and quantity of medical image datasets.
Our approach utilise a user-friendly online platform that enables a diverse group of crowd annotators to label medical images efficiently.
We employ pix2pixGAN, a generative AI model, to expand the training dataset with synthetic images that capture realistic morphological features.
arXiv Detail & Related papers (2024-09-04T21:22:54Z) - Full-Scale Indexing and Semantic Annotation of CT Imaging: Boosting FAIRness [0.41942958779358674]
The proposed approach focuses on the integration and enhancement of clinical computed tomography (CT) image series for better findability, accessibility, interoperability, and reusability.
The metadata is standardized with HL7 FHIR resources to enable efficient data recognition and data exchange between research projects.
The study successfully integrates a robust process within the UKSH MeDIC, leading to the semantic enrichment of over 230,000 CT image series and over 8 million SNOMED CT annotations.
arXiv Detail & Related papers (2024-06-21T17:55:22Z) - Radiology Report Generation Using Transformers Conditioned with
Non-imaging Data [55.17268696112258]
This paper proposes a novel multi-modal transformer network that integrates chest x-ray (CXR) images and associated patient demographic information.
The proposed network uses a convolutional neural network to extract visual features from CXRs and a transformer-based encoder-decoder network that combines the visual features with semantic text embeddings of patient demographic information.
arXiv Detail & Related papers (2023-11-18T14:52:26Z) - Building Flexible, Scalable, and Machine Learning-ready Multimodal
Oncology Datasets [17.774341783844026]
This work proposes Multimodal Integration of Oncology Data System (MINDS)
MINDS is a flexible, scalable, and cost-effective metadata framework for efficiently fusing disparate data from public sources.
By harmonizing multimodal data, MINDS aims to potentially empower researchers with greater analytical ability.
arXiv Detail & Related papers (2023-09-30T15:44:39Z) - Building RadiologyNET: Unsupervised annotation of a large-scale
multimodal medical database [0.4915744683251151]
The usage of machine learning in medical diagnosis and treatment has witnessed significant growth in recent years.
However, the availability of large annotated image datasets remains a major obstacle since the process of annotation is time-consuming and costly.
This paper explores how to automatically annotate a database of medical radiology images with regard to their semantic similarity.
arXiv Detail & Related papers (2023-07-27T13:00:33Z) - DeepMediX: A Deep Learning-Driven Resource-Efficient Medical Diagnosis
Across the Spectrum [15.382184404673389]
This work presents textttDeepMediX, a groundbreaking, resource-efficient model that significantly addresses this challenge.
Built on top of the MobileNetV2 architecture, DeepMediX excels in classifying brain MRI scans and skin cancer images.
DeepMediX's design also includes the concept of Federated Learning, enabling a collaborative learning approach without compromising data privacy.
arXiv Detail & Related papers (2023-07-01T12:30:58Z) - LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical
Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets.
We have collected approximately 1.3 million medical images from 55 publicly available datasets.
LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z) - Surgical tool classification and localization: results and methods from
the MICCAI 2022 SurgToolLoc challenge [69.91670788430162]
We present the results of the SurgLoc 2022 challenge.
The goal was to leverage tool presence data as weak labels for machine learning models trained to detect tools.
We conclude by discussing these results in the broader context of machine learning and surgical data science.
arXiv Detail & Related papers (2023-05-11T21:44:39Z) - When Accuracy Meets Privacy: Two-Stage Federated Transfer Learning
Framework in Classification of Medical Images on Limited Data: A COVID-19
Case Study [77.34726150561087]
COVID-19 pandemic has spread rapidly and caused a shortage of global medical resources.
CNN has been widely utilized and verified in analyzing medical images.
arXiv Detail & Related papers (2022-03-24T02:09:41Z) - Therapeutics Data Commons: Machine Learning Datasets and Tasks for
Therapeutics [84.94299203422658]
Therapeutics Data Commons is a framework to systematically access and evaluate machine learning across the entire range of therapeutics.
At its core, TDC is a collection of curated datasets and learning tasks that can translate algorithmic innovation into biomedical and clinical implementation.
TDC also provides an ecosystem of tools, libraries, leaderboards, and community resources, including data functions, strategies for systematic model evaluation, meaningful data splits, data processors, and molecule generation oracles.
arXiv Detail & Related papers (2021-02-18T18:50:31Z) - Self-Training with Improved Regularization for Sample-Efficient Chest
X-Ray Classification [80.00316465793702]
We present a deep learning framework that enables robust modeling in challenging scenarios.
Our results show that using 85% lesser labeled data, we can build predictive models that match the performance of classifiers trained in a large-scale data setting.
arXiv Detail & Related papers (2020-05-03T02:36:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.