Related papers: Legilimens: Performant Video Analytics on the System-on-Chip Edge

Legilimens: Performant Video Analytics on the System-on-Chip Edge

URL: http://arxiv.org/abs/2504.21136v1
Date: Tue, 29 Apr 2025 19:45:33 GMT
Title: Legilimens: Performant Video Analytics on the System-on-Chip Edge
Authors: Murali Ramanujam, Yinwei Dai, Kyle Jamieson, Ravi Netravali,
Abstract summary: Legilimens is a continuous learning system for the mobile edge's System-on-Chip.<n>We present new, compute-efficient techniques to select high-utility data samples for retraining specialized models.<n>Across diverse workloads, Legilimens lowers retraining costs by 2.8-10x compared to existing systems, resulting in 18-45% higher accuracies.
Score: 11.779236412531166
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Continually retraining models has emerged as a primary technique to enable high-accuracy video analytics on edge devices. Yet, existing systems employ such adaptation by relying on the spare compute resources that traditional (memory-constrained) edge servers afford. In contrast, mobile edge devices such as drones and dashcams offer a fundamentally different resource profile: weak(er) compute with abundant unified memory pools. We present Legilimens, a continuous learning system for the mobile edge's System-on-Chip GPUs. Our driving insight is that visually distinct scenes that require retraining exhibit substantial overlap in model embeddings; if captured into a base model on device memory, specializing to each new scene can become lightweight, requiring very few samples. To practically realize this approach, Legilimens presents new, compute-efficient techniques to (1) select high-utility data samples for retraining specialized models, (2) update the base model without complete retraining, and (3) time-share compute resources between retraining and live inference for maximal accuracy. Across diverse workloads, Legilimens lowers retraining costs by 2.8-10x compared to existing systems, resulting in 18-45% higher accuracies.

Related papers

AutoHete: An Automatic and Efficient Heterogeneous Training System for LLMs [68.99086112477565]
Transformer-based large language models (LLMs) have demonstrated exceptional capabilities in sequence modeling and text generation.<n>Existing heterogeneous training methods significantly expand the scale of trainable models but introduce substantial communication overheads and CPU workloads.<n>We propose AutoHete, an automatic and efficient heterogeneous training system compatible with both single- GPU and multi- GPU environments.
arXiv Detail & Related papers (2025-02-27T14:46:22Z)
Memory Is Not the Bottleneck: Cost-Efficient Continual Learning via Weight Space Consolidation [55.77835198580209]
Continual learning (CL) has traditionally emphasized minimizing exemplar memory usage, assuming that memory is the primary bottleneck.<n>This paper re-examines CL under a more realistic setting with sufficient exemplar memory, where the system can retain a representative portion of past data.<n>We find that, under this regime, stability improves due to reduced forgetting, but plasticity diminishes as the model becomes biased toward prior tasks and struggles to adapt to new ones.
arXiv Detail & Related papers (2025-02-11T05:40:52Z)
Stepping Forward on the Last Mile [8.756033984943178]
We propose a series of algorithm enhancements that further reduce the memory footprint, and the accuracy gap compared to backpropagation. Our results demonstrate that on the last mile of model customization on edge devices, training with fixed-point forward gradients is a feasible and practical approach.
arXiv Detail & Related papers (2024-11-06T16:33:21Z)
Edge Unlearning is Not "on Edge"! An Adaptive Exact Unlearning System on Resource-Constrained Devices [26.939025828011196]
The right to be forgotten mandates that machine learning models enable the erasure of a data owner's data and information from a trained model. We propose a Constraint-aware Adaptive Exact Unlearning System at the network Edge (CAUSE) to enable exact unlearning on resource-constrained devices.
arXiv Detail & Related papers (2024-10-14T03:28:09Z)
Taming 3DGS: High-Quality Radiance Fields with Limited Resources [50.92437599516609]
3D Gaussian Splatting (3DGS) has transformed novel-view synthesis with its fast, interpretable, and high-fidelity rendering. We tackle the challenges of training and rendering 3DGS models on a budget. We derive faster, numerically equivalent solutions for gradient computation and attribute updates.
arXiv Detail & Related papers (2024-06-21T20:44:23Z)
EdgeSync: Faster Edge-model Updating via Adaptive Continuous Learning for Video Data Drift [7.165359653719119]
Real-time video analytics systems typically place models with fewer weights on edge devices to reduce latency. The distribution of video content features may change over time, leading to accuracy degradation of existing models. Recent work proposes a framework that uses a remote server to continually train and adapt the lightweight model at edge with the help of complex model.
arXiv Detail & Related papers (2024-06-05T07:06:26Z)
Time-, Memory- and Parameter-Efficient Visual Adaptation [75.28557015773217]
We propose an adaptation method which does not backpropagate gradients through the backbone. We achieve this by designing a lightweight network in parallel that operates on features from the frozen, pretrained backbone.
arXiv Detail & Related papers (2024-02-05T10:55:47Z)
Fast Machine Unlearning Without Retraining Through Selective Synaptic Dampening [51.34904967046097]
Selective Synaptic Dampening (SSD) is a fast, performant, and does not require long-term storage of the training data. We present a novel two-step, post hoc, retrain-free approach to machine unlearning which is fast, performant, and does not require long-term storage of the training data.
arXiv Detail & Related papers (2023-08-15T11:30:45Z)
Aggregating Capacity in FL through Successive Layer Training for Computationally-Constrained Devices [3.4530027457862]
Federated learning (FL) is usually performed on resource-constrained edge devices. FL training process should be adjusted to such constraints. We propose a new method that enables successive freezing and training of the parameters of the FL model at devices.
arXiv Detail & Related papers (2023-05-26T15:04:06Z)
On-Device Training Under 256KB Memory [62.95579393237751]
We propose an algorithm-system co-design framework to make on-device training possible with only 256KB of memory. Our framework is the first solution to enable tiny on-device training of convolutional neural networks under 256KB and 1MB Flash.
arXiv Detail & Related papers (2022-06-30T17:59:08Z)
Powerpropagation: A sparsity inducing weight reparameterisation [65.85142037667065]
We introduce Powerpropagation, a new weight- parameterisation for neural networks that leads to inherently sparse models. Models trained in this manner exhibit similar performance, but have a distribution with markedly higher density at zero, allowing more parameters to be pruned safely. Here, we combine Powerpropagation with a traditional weight-pruning technique as well as recent state-of-the-art sparse-to-sparse algorithms, showing superior performance on the ImageNet benchmark.
arXiv Detail & Related papers (2021-10-01T10:03:57Z)
Ekya: Continuous Learning of Video Analytics Models on Edge Compute Servers [20.533131101027067]
Continuous learning handles data drift by periodically retraining the models on new data. Our work addresses the challenge of jointly supporting inference and retraining tasks on edge servers. Ekya balances this tradeoff across multiple models and uses a micro-profiler to identify the models that will benefit the most by retraining.
arXiv Detail & Related papers (2020-12-19T00:29:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.