Low-Level Dataset Distillation for Medical Image Enhancement
- URL: http://arxiv.org/abs/2511.13106v1
- Date: Mon, 17 Nov 2025 08:05:07 GMT
- Title: Low-Level Dataset Distillation for Medical Image Enhancement
- Authors: Fengzhi Xu, Ziyuan Yang, Mengyu Sun, Joey Tianyi Zhou, Yi Zhang,
- Abstract summary: We propose the first low-level dataset distillation (DD) method for medical image enhancement.<n>We first leverage anatomical similarities across patients to construct the shared anatomical prior.<n>This prior is then personalized for each patient using a Structure-Preserving Personalized Generation (SPG) module.<n>For different low-level tasks, the distilled data is used to construct task-specific high- and low-quality training pairs.
- Score: 44.15651365046869
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Medical image enhancement is clinically valuable, but existing methods require large-scale datasets to learn complex pixel-level mappings. However, the substantial training and storage costs associated with these datasets hinder their practical deployment. While dataset distillation (DD) can alleviate these burdens, existing methods mainly target high-level tasks, where multiple samples share the same label. This many-to-one mapping allows distilled data to capture shared semantics and achieve information compression. In contrast, low-level tasks involve a many-to-many mapping that requires pixel-level fidelity, making low-level DD an underdetermined problem, as a small distilled dataset cannot fully constrain the dense pixel-level mappings. To address this, we propose the first low-level DD method for medical image enhancement. We first leverage anatomical similarities across patients to construct the shared anatomical prior based on a representative patient, which serves as the initialization for the distilled data of different patients. This prior is then personalized for each patient using a Structure-Preserving Personalized Generation (SPG) module, which integrates patient-specific anatomical information into the distilled dataset while preserving pixel-level fidelity. For different low-level tasks, the distilled data is used to construct task-specific high- and low-quality training pairs. Patient-specific knowledge is injected into the distilled data by aligning the gradients computed from networks trained on the distilled pairs with those from the corresponding patient's raw data. Notably, downstream users cannot access raw patient data. Instead, only a distilled dataset containing abstract training information is shared, which excludes patient-specific details and thus preserves privacy.
Related papers
- High-Order Progressive Trajectory Matching for Medical Image Dataset Distillation [38.58097105351775]
Trajectory matching has emerged as a promising methodology for dataset distillation.<n>We propose a shape-wise potential that captures the geometric structure of parameter trajectories, and an easy-to-complex matching strategy.<n> Experiments on medical image classification tasks demonstrate that our method improves distillation performance while preserving privacy.
arXiv Detail & Related papers (2025-09-29T01:45:11Z) - Task-Specific Knowledge Distillation from the Vision Foundation Model for Enhanced Medical Image Segmentation [13.018234326432964]
We propose a novel and generalizable task-specific knowledge distillation framework for medical image segmentation.<n>Our method fine-tunes the VFM on the target segmentation task to capture task-specific features before distilling the knowledge to smaller models.<n> Experimental results across five medical image datasets demonstrate that our method consistently outperforms task-agnostic knowledge distillation.
arXiv Detail & Related papers (2025-03-10T06:39:53Z) - Not All Samples Should Be Utilized Equally: Towards Understanding and Improving Dataset Distillation [57.6797306341115]
We take an initial step towards understanding various matching-based DD methods from the perspective of sample difficulty.<n>We then extend the neural scaling laws of data pruning to DD to theoretically explain these matching-based methods.<n>We introduce the Sample Difficulty Correction (SDC) approach, designed to predominantly generate easier samples to achieve higher dataset quality.
arXiv Detail & Related papers (2024-08-22T15:20:32Z) - Dataset Quantization [72.61936019738076]
We present dataset quantization (DQ), a new framework to compress large-scale datasets into small subsets.
DQ is the first method that can successfully distill large-scale datasets such as ImageNet-1k with a state-of-the-art compression ratio.
arXiv Detail & Related papers (2023-08-21T07:24:29Z) - Self-Supervised Pre-Training with Contrastive and Masked Autoencoder
Methods for Dealing with Small Datasets in Deep Learning for Medical Imaging [8.34398674359296]
Deep learning in medical imaging has the potential to minimize the risk of diagnostic errors, reduce radiologist workload, and accelerate diagnosis.
Training such deep learning models requires large and accurate datasets, with annotations for all training samples.
To address this challenge, deep learning models can be pre-trained on large image datasets without annotations using methods from the field of self-supervised learning.
arXiv Detail & Related papers (2023-08-12T11:31:01Z) - PCA: Semi-supervised Segmentation with Patch Confidence Adversarial
Training [52.895952593202054]
We propose a new semi-supervised adversarial method called Patch Confidence Adrial Training (PCA) for medical image segmentation.
PCA learns the pixel structure and context information in each patch to get enough gradient feedback, which aids the discriminator in convergent to an optimal state.
Our method outperforms the state-of-the-art semi-supervised methods, which demonstrates its effectiveness for medical image segmentation.
arXiv Detail & Related papers (2022-07-24T07:45:47Z) - Self-Supervised Learning as a Means To Reduce the Need for Labeled Data
in Medical Image Analysis [64.4093648042484]
We use a dataset of chest X-ray images with bounding box labels for 13 different classes of anomalies.
We show that it is possible to achieve similar performance to a fully supervised model in terms of mean average precision and accuracy with only 60% of the labeled data.
arXiv Detail & Related papers (2022-06-01T09:20:30Z) - SimCVD: Simple Contrastive Voxel-Wise Representation Distillation for
Semi-Supervised Medical Image Segmentation [7.779842667527933]
We present SimCVD, a simple contrastive distillation framework that significantly advances state-of-the-art voxel-wise representation learning.
SimCVD achieves an average Dice score of 90.85% and 89.03% respectively, a 0.91% and 2.22% improvement compared to previous best results.
arXiv Detail & Related papers (2021-08-13T13:17:58Z) - Positional Contrastive Learning for Volumetric Medical Image
Segmentation [13.086140606803408]
We propose a novel positional contrastive learning framework to generate contrastive data pairs.
The proposed PCL method can substantially improve the segmentation performance compared to existing methods in both semi-supervised setting and transfer learning setting.
arXiv Detail & Related papers (2021-06-16T22:15:28Z) - Towards Robust Partially Supervised Multi-Structure Medical Image
Segmentation on Small-Scale Data [123.03252888189546]
We propose Vicinal Labels Under Uncertainty (VLUU) to bridge the methodological gaps in partially supervised learning (PSL) under data scarcity.
Motivated by multi-task learning and vicinal risk minimization, VLUU transforms the partially supervised problem into a fully supervised problem by generating vicinal labels.
Our research suggests a new research direction in label-efficient deep learning with partial supervision.
arXiv Detail & Related papers (2020-11-28T16:31:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.