Lightweight Edge Learning via Dataset Pruning
- URL: http://arxiv.org/abs/2602.00047v1
- Date: Mon, 19 Jan 2026 12:23:57 GMT
- Title: Lightweight Edge Learning via Dataset Pruning
- Authors: Laha Ale, Hu Luo, Mingsheng Cao, Shichao Li, Huanlai Xing, Haifeng Sun,
- Abstract summary: We propose a data-centric optimization framework that leverages dataset pruning to achieve resource-efficient edge learning.<n>Our framework achieves a near-linear reduction in training latency and energy consumption proportional to the pruning ratio, with negligible degradation in model accuracy.
- Score: 11.037312322970626
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Edge learning facilitates ubiquitous intelligence by enabling model training and adaptation directly on data-generating devices, thereby mitigating privacy risks and communication latency. However, the high computational and energy overhead of on-device training hinders its deployment on battery-powered mobile systems with strict thermal and memory budgets. While prior research has extensively optimized model architectures for efficient inference, the training phase remains bottlenecked by the processing of massive, often redundant, local datasets. In this work, we propose a data-centric optimization framework that leverages dataset pruning to achieve resource-efficient edge learning. Unlike standard methods that process all available data, our approach constructs compact, highly informative training subsets via a lightweight, on-device importance evaluation. Specifically, we utilize average loss statistics derived from a truncated warm-up phase to rank sample importance, deterministically retaining only the most critical data points under a dynamic pruning ratio. This mechanism is model-agnostic and operates locally without inter-device communication. Extensive experiments on standard image classification benchmarks demonstrate that our framework achieves a near-linear reduction in training latency and energy consumption proportional to the pruning ratio, with negligible degradation in model accuracy. These results validate dataset pruning as a vital, complementary paradigm for enhancing the sustainability and scalability of learning on resource-constrained mobile edge devices.
Related papers
- Learning More with Less: A Generalizable, Self-Supervised Framework for Privacy-Preserving Capacity Estimation with EV Charging Data [84.37348569981307]
We propose a first-of-its-kind capacity estimation model based on self-supervised pre-training.<n>Our model consistently outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2025-10-05T08:58:35Z) - On-device edge learning for IoT data streams: a survey [1.7186863539230333]
This literature review explores continual learning methods for on-device training in the context of neural networks (NNs) and decision trees (DTs)<n>We highlight key constraints, such as data architecture (batch vs. stream) and network capacity (cloud vs. edge)<n>The survey details the challenges of deploying deep learners on resource-constrained edge devices.
arXiv Detail & Related papers (2025-02-25T02:41:23Z) - Edge Unlearning is Not "on Edge"! An Adaptive Exact Unlearning System on Resource-Constrained Devices [26.939025828011196]
The right to be forgotten mandates that machine learning models enable the erasure of a data owner's data and information from a trained model.
We propose a Constraint-aware Adaptive Exact Unlearning System at the network Edge (CAUSE) to enable exact unlearning on resource-constrained devices.
arXiv Detail & Related papers (2024-10-14T03:28:09Z) - Center-Sensitive Kernel Optimization for Efficient On-Device Incremental Learning [88.78080749909665]
Current on-device training methods just focus on efficient training without considering the catastrophic forgetting.<n>This paper proposes a simple but effective edge-friendly incremental learning framework.<n>Our method achieves average accuracy boost of 38.08% with even less memory and approximate computation.
arXiv Detail & Related papers (2024-06-13T05:49:29Z) - Filling the Missing: Exploring Generative AI for Enhanced Federated
Learning over Heterogeneous Mobile Edge Devices [72.61177465035031]
We propose a generative AI-empowered federated learning to address these challenges by leveraging the idea of FIlling the MIssing (FIMI) portion of local data.
Experiment results demonstrate that FIMI can save up to 50% of the device-side energy to achieve the target global test accuracy.
arXiv Detail & Related papers (2023-10-21T12:07:04Z) - Adaptive Model Pruning and Personalization for Federated Learning over
Wireless Networks [72.59891661768177]
Federated learning (FL) enables distributed learning across edge devices while protecting data privacy.
We consider a FL framework with partial model pruning and personalization to overcome these challenges.
This framework splits the learning model into a global part with model pruning shared with all devices to learn data representations and a personalized part to be fine-tuned for a specific device.
arXiv Detail & Related papers (2023-09-04T21:10:45Z) - Analysis and Optimization of Wireless Federated Learning with Data
Heterogeneity [72.85248553787538]
This paper focuses on performance analysis and optimization for wireless FL, considering data heterogeneity, combined with wireless resource allocation.
We formulate the loss function minimization problem, under constraints on long-term energy consumption and latency, and jointly optimize client scheduling, resource allocation, and the number of local training epochs (CRE)
Experiments on real-world datasets demonstrate that the proposed algorithm outperforms other benchmarks in terms of the learning accuracy and energy consumption.
arXiv Detail & Related papers (2023-08-04T04:18:01Z) - Dynamic Scheduling for Federated Edge Learning with Streaming Data [56.91063444859008]
We consider a Federated Edge Learning (FEEL) system where training data are randomly generated over time at a set of distributed edge devices with long-term energy constraints.
Due to limited communication resources and latency requirements, only a subset of devices is scheduled for participating in the local training process in every iteration.
arXiv Detail & Related papers (2023-05-02T07:41:16Z) - A Deep-Learning Intelligent System Incorporating Data Augmentation for
Short-Term Voltage Stability Assessment of Power Systems [9.299576471941753]
This paper proposes a novel deep-learning intelligent system incorporating data augmentation for STVSA of power systems.
Due to the unavailability of reliable quantitative criteria to judge the stability status for a specific power system, semi-supervised cluster learning is leveraged to obtain labeled samples.
conditional least squares generative adversarial networks (LSGAN)-based data augmentation is introduced to expand the original dataset.
arXiv Detail & Related papers (2021-12-05T11:40:54Z) - Continual Learning at the Edge: Real-Time Training on Smartphone Devices [11.250227901473952]
This paper describes the implementation and deployment of a hybrid learning strategy (AR1*) on a native Android application for real-time on-device personalization without forgetting.
Our benchmark, based on an extension of the CORe50 dataset, shows the efficiency and effectiveness of our solution.
arXiv Detail & Related papers (2021-05-24T12:00:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.