High-Order Progressive Trajectory Matching for Medical Image Dataset Distillation
- URL: http://arxiv.org/abs/2509.24177v1
- Date: Mon, 29 Sep 2025 01:45:11 GMT
- Title: High-Order Progressive Trajectory Matching for Medical Image Dataset Distillation
- Authors: Le Dong, Jinghao Bian, Jingyang Hou, Jingliang Hu, Yilei Shi, Weisheng Dong, Xiao Xiang Zhu, Lichao Mou,
- Abstract summary: Trajectory matching has emerged as a promising methodology for dataset distillation.<n>We propose a shape-wise potential that captures the geometric structure of parameter trajectories, and an easy-to-complex matching strategy.<n> Experiments on medical image classification tasks demonstrate that our method improves distillation performance while preserving privacy.
- Score: 38.58097105351775
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Medical image analysis faces significant challenges in data sharing due to privacy regulations and complex institutional protocols. Dataset distillation offers a solution to address these challenges by synthesizing compact datasets that capture essential information from real, large medical datasets. Trajectory matching has emerged as a promising methodology for dataset distillation; however, existing methods primarily focus on terminal states, overlooking crucial information in intermediate optimization states. We address this limitation by proposing a shape-wise potential that captures the geometric structure of parameter trajectories, and an easy-to-complex matching strategy that progressively addresses parameters based on their complexity. Experiments on medical image classification tasks demonstrate that our method improves distillation performance while preserving privacy and maintaining model accuracy comparable to training on the original datasets. Our code is available at https://github.com/Bian-jh/HoP-TM.
Related papers
- Low-Level Dataset Distillation for Medical Image Enhancement [44.15651365046869]
We propose the first low-level dataset distillation (DD) method for medical image enhancement.<n>We first leverage anatomical similarities across patients to construct the shared anatomical prior.<n>This prior is then personalized for each patient using a Structure-Preserving Personalized Generation (SPG) module.<n>For different low-level tasks, the distilled data is used to construct task-specific high- and low-quality training pairs.
arXiv Detail & Related papers (2025-11-17T08:05:07Z) - Dataset Distillation as Pushforward Optimal Quantization [2.5892916589735457]
We propose a synthetic training set that achieves similar performance to training on real data, with orders of magnitude less computational requirements.<n>In particular, we link existing disentangled dataset distillation methods to the classical optimal quantization and Wasserstein barycenter problems.<n>We achieve better performance and inter-model generalization on the ImageNet-1K dataset with trivial additional computation, and SOTA performance in higher image-per-class settings.
arXiv Detail & Related papers (2025-01-13T20:41:52Z) - Diffusion-Augmented Coreset Expansion for Scalable Dataset Distillation [18.474302012851087]
We propose a two-stage solution for dataset distillation.<n>First, we compress the dataset by selecting only the most informative patches to form a coreset.<n>Next, we leverage a generative foundation model to dynamically expand this compressed set in real-time.<n>We demonstrate a significant improvement of over 10% compared to the state-of-the-art on several large-scale dataset distillation benchmarks.
arXiv Detail & Related papers (2024-12-05T23:40:27Z) - Not All Samples Should Be Utilized Equally: Towards Understanding and Improving Dataset Distillation [57.6797306341115]
We take an initial step towards understanding various matching-based DD methods from the perspective of sample difficulty.<n>We then extend the neural scaling laws of data pruning to DD to theoretically explain these matching-based methods.<n>We introduce the Sample Difficulty Correction (SDC) approach, designed to predominantly generate easier samples to achieve higher dataset quality.
arXiv Detail & Related papers (2024-08-22T15:20:32Z) - Image Distillation for Safe Data Sharing in Histopathology [10.398266052019675]
Histopathology can help clinicians make accurate diagnoses, determine disease prognosis, and plan appropriate treatment strategies.
As deep learning techniques prove successful in the medical domain, the primary challenges become limited data availability and concerns about data sharing and privacy.
We create a small synthetic dataset that encapsulates essential information, which can be shared without constraints.
We train a latent diffusion model and construct a new distilled synthetic dataset with a small number of human readable synthetic images.
arXiv Detail & Related papers (2024-06-19T13:19:08Z) - EchoNet-Synthetic: Privacy-preserving Video Generation for Safe Medical Data Sharing [5.900946696794718]
We present a model designed to produce high-fidelity, long and accessible complete data samples with near-real-time efficiency.
We develop our generation method based on diffusion models and introduce a protocol for medical video dataset anonymization.
We present EchoNet-Synthetic, a fully synthetic, privacy-compliant echocardiogram dataset with paired ejection fraction labels.
arXiv Detail & Related papers (2024-06-02T17:18:06Z) - Progressive trajectory matching for medical dataset distillation [15.116863763717623]
It is essential but challenging to share medical image datasets due to privacy issues.
We propose a novel dataset distillation method to condense the original medical image datasets into a synthetic one.
arXiv Detail & Related papers (2024-03-20T10:18:20Z) - Importance-Aware Adaptive Dataset Distillation [53.79746115426363]
Development of deep learning models is enabled by the availability of large-scale datasets.
dataset distillation aims to synthesize a compact dataset that retains the essential information from the large original dataset.
We propose an importance-aware adaptive dataset distillation (IADD) method that can improve distillation performance.
arXiv Detail & Related papers (2024-01-29T03:29:39Z) - Towards Efficient Deep Hashing Retrieval: Condensing Your Data via Feature-Embedding Matching [5.2193774924981176]
Training advanced deep hashing models has become more expensive due to complex optimizations and large datasets.<n>We propose IEM (Information-intensive feature Embedding Matching), which is centered on distribution matching and incorporates model and data augmentation techniques to further enhance the feature of hashing space.
arXiv Detail & Related papers (2023-05-29T13:23:55Z) - Generalizing Dataset Distillation via Deep Generative Prior [75.9031209877651]
We propose to distill an entire dataset's knowledge into a few synthetic images.
The idea is to synthesize a small number of synthetic data points that, when given to a learning algorithm as training data, result in a model approximating one trained on the original data.
We present a new optimization algorithm that distills a large number of images into a few intermediate feature vectors in the generative model's latent space.
arXiv Detail & Related papers (2023-05-02T17:59:31Z) - A Comprehensive Survey of Dataset Distillation [73.15482472726555]
It has become challenging to handle the unlimited growth of data with limited computing power.
Deep learning technology has developed unprecedentedly in the last decade.
This paper provides a holistic understanding of dataset distillation from multiple aspects.
arXiv Detail & Related papers (2023-01-13T15:11:38Z) - Minimizing the Accumulated Trajectory Error to Improve Dataset
Distillation [151.70234052015948]
We propose a novel approach that encourages the optimization algorithm to seek a flat trajectory.
We show that the weights trained on synthetic data are robust against the accumulated errors perturbations with the regularization towards the flat trajectory.
Our method, called Flat Trajectory Distillation (FTD), is shown to boost the performance of gradient-matching methods by up to 4.7%.
arXiv Detail & Related papers (2022-11-20T15:49:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.