Dataset Distillation in Medical Imaging: A Feasibility Study
- URL: http://arxiv.org/abs/2407.14429v1
- Date: Fri, 19 Jul 2024 15:59:04 GMT
- Title: Dataset Distillation in Medical Imaging: A Feasibility Study
- Authors: Muyang Li, Can Cui, Quan Liu, Ruining Deng, Tianyuan Yao, Marilyn Lionts, Yuankai Huo,
- Abstract summary: Data sharing in the medical image analysis field has potential yet remains underappreciated.
One possible solution is to avoid transferring the entire dataset while still achieving similar model performance.
Recent progress in data distillation within computer science offers promising prospects for sharing medical data efficiently.
- Score: 16.44272552893816
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data sharing in the medical image analysis field has potential yet remains underappreciated. The aim is often to share datasets efficiently with other sites to train models effectively. One possible solution is to avoid transferring the entire dataset while still achieving similar model performance. Recent progress in data distillation within computer science offers promising prospects for sharing medical data efficiently without significantly compromising model effectiveness. However, it remains uncertain whether these methods would be applicable to medical imaging, since medical and natural images are distinct fields. Moreover, it is intriguing to consider what level of performance could be achieved with these methods. To answer these questions, we conduct investigations on a variety of leading data distillation methods, in different contexts of medical imaging. We evaluate the feasibility of these methods with extensive experiments in two aspects: 1) Assess the impact of data distillation across multiple datasets characterized by minor or great variations. 2) Explore the indicator to predict the distillation performance. Our extensive experiments across multiple medical datasets reveal that data distillation can significantly reduce dataset size while maintaining comparable model performance to that achieved with the full dataset, suggesting that a small, representative sample of images can serve as a reliable indicator of distillation success. This study demonstrates that data distillation is a viable method for efficient and secure medical data sharing, with the potential to facilitate enhanced collaborative research and clinical applications.
Related papers
- Image Distillation for Safe Data Sharing in Histopathology [10.398266052019675]
Histopathology can help clinicians make accurate diagnoses, determine disease prognosis, and plan appropriate treatment strategies.
As deep learning techniques prove successful in the medical domain, the primary challenges become limited data availability and concerns about data sharing and privacy.
We create a small synthetic dataset that encapsulates essential information, which can be shared without constraints.
We train a latent diffusion model and construct a new distilled synthetic dataset with a small number of human readable synthetic images.
arXiv Detail & Related papers (2024-06-19T13:19:08Z) - Progressive trajectory matching for medical dataset distillation [15.116863763717623]
It is essential but challenging to share medical image datasets due to privacy issues.
We propose a novel dataset distillation method to condense the original medical image datasets into a synthetic one.
arXiv Detail & Related papers (2024-03-20T10:18:20Z) - Importance-Aware Adaptive Dataset Distillation [53.79746115426363]
Development of deep learning models is enabled by the availability of large-scale datasets.
dataset distillation aims to synthesize a compact dataset that retains the essential information from the large original dataset.
We propose an importance-aware adaptive dataset distillation (IADD) method that can improve distillation performance.
arXiv Detail & Related papers (2024-01-29T03:29:39Z) - Distill Gold from Massive Ores: Bi-level Data Pruning towards Efficient Dataset Distillation [96.92250565207017]
We study the data efficiency and selection for the dataset distillation task.
By re-formulating the dynamics of distillation, we provide insight into the inherent redundancy in the real dataset.
We find the most contributing samples based on their causal effects on the distillation.
arXiv Detail & Related papers (2023-05-28T06:53:41Z) - A Comprehensive Study on Dataset Distillation: Performance, Privacy,
Robustness and Fairness [8.432686179800543]
We conduct extensive experiments to evaluate current state-of-the-art dataset distillation methods.
We successfully use membership inference attacks to show that privacy risks still remain.
This work offers a large-scale benchmarking framework for dataset distillation evaluation.
arXiv Detail & Related papers (2023-05-05T08:19:27Z) - A Comprehensive Survey of Dataset Distillation [73.15482472726555]
It has become challenging to handle the unlimited growth of data with limited computing power.
Deep learning technology has developed unprecedentedly in the last decade.
This paper provides a holistic understanding of dataset distillation from multiple aspects.
arXiv Detail & Related papers (2023-01-13T15:11:38Z) - Dataset Distillation for Medical Dataset Sharing [38.65823547986758]
dataset distillation can synthesize a small dataset such that models trained on it achieve comparable performance with the original large dataset.
Experimental results on a COVID-19 chest X-ray image dataset show that our method can achieve high detection performance even using scarce anonymized chest X-ray images.
arXiv Detail & Related papers (2022-09-29T07:49:20Z) - Understanding the Tricks of Deep Learning in Medical Image Segmentation:
Challenges and Future Directions [66.40971096248946]
In this paper, we collect a series of MedISeg tricks for different model implementation phases.
We experimentally explore the effectiveness of these tricks on consistent baselines.
We also open-sourced a strong MedISeg repository, where each component has the advantage of plug-and-play.
arXiv Detail & Related papers (2022-09-21T12:30:05Z) - Learning to Generate Synthetic Training Data using Gradient Matching and
Implicit Differentiation [77.34726150561087]
This article explores various data distillation techniques that can reduce the amount of data required to successfully train deep networks.
Inspired by recent ideas, we suggest new data distillation techniques based on generative teaching networks, gradient matching, and the Implicit Function Theorem.
arXiv Detail & Related papers (2022-03-16T11:45:32Z) - Soft-Label Anonymous Gastric X-ray Image Distillation [49.24576562557866]
This paper presents a soft-label anonymous gastric X-ray image distillation method based on a gradient descent approach.
Experimental results show that the proposed method can not only effectively compress the medical dataset but also anonymize medical images to protect the patient's private information.
arXiv Detail & Related papers (2021-04-07T02:04:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.