Enhancing Classification of Streaming Data with Image Distillation
- URL: http://arxiv.org/abs/2509.07049v1
- Date: Mon, 08 Sep 2025 15:24:35 GMT
- Title: Enhancing Classification of Streaming Data with Image Distillation
- Authors: Rwad Khatib, Yehudit Aperstein,
- Abstract summary: This study tackles the challenge of efficiently classifying streaming data in envi-ronments with limited memory and computational resources.<n>It delves into the application of data distillation as an innovative approach to improve the precision of streaming image data classification.
- Score: 1.2891210250935148
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This study tackles the challenge of efficiently classifying streaming data in envi-ronments with limited memory and computational resources. It delves into the application of data distillation as an innovative approach to improve the precision of streaming image data classification. By focusing on distilling essential features from data streams, our method aims to minimize computational demands while preserving crucial information for accurate classification. Our investigation com-pares this approach against traditional algorithms like Hoeffding Trees and Adap-tive Random Forest, adapted through embeddings for image data. The Distillation Based Classification (DBC) demonstrated superior performance, achieving a 73.1% accuracy rate, surpassing both traditional methods and Reservoir Sam-pling Based Classification (RBC) technique. This marks a significant advance-ment in streaming data classification, showcasing the effectiveness of our method in processing complex data streams and setting a new standard for accuracy and efficiency.
Related papers
- Difficulty-guided Sampling: Bridging the Target Gap between Dataset Distillation and Downstream Tasks [55.27114962330541]
We propose difficulty-guided sampling (DGS) to bridge the target gap between the distillation objective and the downstream task.<n>Deep neural networks achieve remarkable performance but have time and storage-consuming training processes.
arXiv Detail & Related papers (2026-01-15T05:29:50Z) - DD-Ranking: Rethinking the Evaluation of Dataset Distillation [223.28392857127733]
We propose DD-Ranking, a unified evaluation framework, along with new general evaluation metrics to uncover the true performance improvements achieved by different methods.<n>By refocusing on the actual information enhancement of distilled datasets, DD-Ranking provides a more comprehensive and fair evaluation standard for future research advancements.
arXiv Detail & Related papers (2025-05-19T16:19:50Z) - Dataset Distillation as Pushforward Optimal Quantization [2.5892916589735457]
We propose a synthetic training set that achieves similar performance to training on real data, with orders of magnitude less computational requirements.<n>In particular, we link existing disentangled dataset distillation methods to the classical optimal quantization and Wasserstein barycenter problems.<n>We achieve better performance and inter-model generalization on the ImageNet-1K dataset with trivial additional computation, and SOTA performance in higher image-per-class settings.
arXiv Detail & Related papers (2025-01-13T20:41:52Z) - Generative Dataset Distillation Based on Self-knowledge Distillation [49.20086587208214]
We present a novel generative dataset distillation method that can improve the accuracy of aligning prediction logits.<n>Our approach integrates self-knowledge distillation to achieve more precise distribution matching between the synthetic and original data.<n>Our method outperforms existing state-of-the-art methods, resulting in superior distillation performance.
arXiv Detail & Related papers (2025-01-08T00:43:31Z) - Efficient Dataset Distillation via Diffusion-Driven Patch Selection for Improved Generalization [34.53986517177061]
We propose a novel framework to existing diffusion-based distillation methods, leveraging diffusion models for selection rather than generation.<n>Our method starts by predicting noise generated by the diffusion model based on input images and text prompts, then calculates the corresponding loss for each pair.<n>This streamlined framework enables a single-step distillation process, and extensive experiments demonstrate that our approach outperforms state-of-the-art methods across various metrics.
arXiv Detail & Related papers (2024-12-13T08:34:46Z) - Leveraging Semi-Supervised Learning to Enhance Data Mining for Image Classification under Limited Labeled Data [35.431340001608476]
Traditional data mining methods are inadequate when faced with large-scale, high-dimensional and complex data.<n>This study introduces semi-supervised learning methods, aiming to improve the algorithm's ability to utilize unlabeled data.<n> Specifically, we adopt a self-training method and combine it with a convolutional neural network (CNN) for image feature extraction and classification.
arXiv Detail & Related papers (2024-11-27T18:59:50Z) - Guided Score identity Distillation for Data-Free One-Step Text-to-Image Generation [62.30570286073223]
Diffusion-based text-to-image generation models have demonstrated the ability to produce images aligned with textual descriptions.<n>We introduce a data-free guided distillation method that enables the efficient distillation of pretrained Diffusion models without access to the real training data.<n>By exclusively training with synthetic images generated by its one-step generator, our data-free distillation method rapidly improves FID and CLIP scores, achieving state-of-the-art FID performance while maintaining a competitive CLIP score.
arXiv Detail & Related papers (2024-06-03T17:44:11Z) - Explicit and Implicit Knowledge Distillation via Unlabeled Data [5.702176304876537]
We propose an efficient unlabeled sample selection method to replace high computational generators.
We also propose a class-dropping mechanism to suppress the label noise caused by the data domain shifts.
Experimental results show that our method can quickly converge and obtain higher accuracy than other state-of-the-art methods.
arXiv Detail & Related papers (2023-02-17T09:10:41Z) - Minimizing the Accumulated Trajectory Error to Improve Dataset
Distillation [151.70234052015948]
We propose a novel approach that encourages the optimization algorithm to seek a flat trajectory.
We show that the weights trained on synthetic data are robust against the accumulated errors perturbations with the regularization towards the flat trajectory.
Our method, called Flat Trajectory Distillation (FTD), is shown to boost the performance of gradient-matching methods by up to 4.7%.
arXiv Detail & Related papers (2022-11-20T15:49:11Z) - Efficient training of lightweight neural networks using Online
Self-Acquired Knowledge Distillation [51.66271681532262]
Online Self-Acquired Knowledge Distillation (OSAKD) is proposed, aiming to improve the performance of any deep neural model in an online manner.
We utilize k-nn non-parametric density estimation technique for estimating the unknown probability distributions of the data samples in the output feature space.
arXiv Detail & Related papers (2021-08-26T14:01:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.