Related papers: Cost-effective On-device Continual Learning over Memory Hierarchy with Miro

Cost-effective On-device Continual Learning over Memory Hierarchy with Miro

URL: http://arxiv.org/abs/2308.06053v4
Date: Tue, 5 Dec 2023 08:51:52 GMT
Title: Cost-effective On-device Continual Learning over Memory Hierarchy with Miro
Authors: Xinyue Ma, Suyeon Jeong, Minjia Zhang, Di Wang, Jonghyun Choi, Myeongjae Jeon
Abstract summary: Miro is a novel system runtime that dynamically configures the CL system based on resource states for the best cost-effectiveness. Miro significantly outperforms baseline systems we build for comparison, consistently achieving higher cost-effectiveness.
Score: 32.93163587457259
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Continual learning (CL) trains NN models incrementally from a continuous stream of tasks. To remember previously learned knowledge, prior studies store old samples over a memory hierarchy and replay them when new tasks arrive. Edge devices that adopt CL to preserve data privacy are typically energy-sensitive and thus require high model accuracy while not compromising energy efficiency, i.e., cost-effectiveness. Our work is the first to explore the design space of hierarchical memory replay-based CL to gain insights into achieving cost-effectiveness on edge devices. We present Miro, a novel system runtime that carefully integrates our insights into the CL framework by enabling it to dynamically configure the CL system based on resource states for the best cost-effectiveness. To reach this goal, Miro also performs online profiling on parameters with clear accuracy-energy trade-offs and adapts to optimal values with low overhead. Extensive evaluations show that Miro significantly outperforms baseline systems we build for comparison, consistently achieving higher cost-effectiveness.

Related papers

Sparse Attention Remapping with Clustering for Efficient LLM Decoding on PIM [7.651654889371008]
Transformer-based models are the foundation of modern machine learning, but their execution places significant pressure on memory systems.<n> processing-in-memory (PIM) architectures are a promising solution, offering high internal bandwidth and compute parallelism near memory.<n>Current PIM designs are primarily optimized for dense attention and struggle with the dynamic, irregular access patterns introduced by modern KV cache sparsity techniques.
arXiv Detail & Related papers (2025-05-09T04:17:05Z)
Activation-aware Probe-Query: Effective Key-Value Retrieval for Long-Context LLMs Inference [56.71209737306054]
We propose textbfActQKV, a training-free, textbfActivation-aware approach that dynamically determines probe-textbfQuery and leverages it to retrieve the relevant textbfKV pairs for inference.<n>Experiments on the Long-Bench and $infty$ Benchmarks demonstrate its state-of-the-art performance with competitive inference quality and resource efficiency.
arXiv Detail & Related papers (2025-02-19T08:50:44Z)
Memory Is Not the Bottleneck: Cost-Efficient Continual Learning via Weight Space Consolidation [55.77835198580209]
Continual learning (CL) has traditionally emphasized minimizing exemplar memory usage, assuming that memory is the primary bottleneck.<n>This paper re-examines CL under a more realistic setting with sufficient exemplar memory, where the system can retain a representative portion of past data.<n>We find that, under this regime, stability improves due to reduced forgetting, but plasticity diminishes as the model becomes biased toward prior tasks and struggles to adapt to new ones.
arXiv Detail & Related papers (2025-02-11T05:40:52Z)
The Efficiency vs. Accuracy Trade-off: Optimizing RAG-Enhanced LLM Recommender Systems Using Multi-Head Early Exit [46.37267466656765]
This paper presents an optimization framework that combines Retrieval-Augmented Generation (RAG) with an innovative multi-head early exit architecture. Our experiments demonstrate how this architecture effectively decreases time without sacrificing the accuracy needed for reliable recommendation delivery.
arXiv Detail & Related papers (2025-01-04T03:26:46Z)
Slowing Down Forgetting in Continual Learning [20.57872238271025]
A common challenge in continual learning (CL) is forgetting, where the performance on old tasks drops after new, additional tasks are learned. We propose a novel framework called ReCL to slow down forgetting in CL.
arXiv Detail & Related papers (2024-11-11T12:19:28Z)
Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design [59.00758127310582]
We propose a novel framework Read-ME that transforms pre-trained dense LLMs into smaller MoE models. Our approach employs activation sparsity to extract experts. Read-ME outperforms other popular open-source dense models of similar scales.
arXiv Detail & Related papers (2024-10-24T19:48:51Z)
Bridging LLMs and KGs without Fine-Tuning: Intermediate Probing Meets Subgraph-Aware Entity Descriptions [49.36683223327633]
Large Language Models (LLMs) encapsulate extensive world knowledge and exhibit powerful context modeling capabilities.<n>We propose a novel framework that synergizes the strengths of LLMs with robust knowledge representation to enable effective and efficient KGC.<n>We achieve a 47% relative improvement over previous methods based on non-fine-tuned LLMs and, to our knowledge, are the first to achieve classification performance comparable to fine-tuned LLMs.
arXiv Detail & Related papers (2024-08-13T10:15:55Z)
Efficient Continual Learning with Low Memory Footprint For Edge Device [6.818488262543482]
This paper proposes a compact algorithm called LightCL to overcome the forgetting problem of Continual Learning. We first propose two new metrics of learning plasticity and memory stability to seek generalizability during CL. In the experimental comparison, LightCL outperforms other SOTA methods in delaying forgetting and reduces at most $textbf6.16$times$$ memory footprint.
arXiv Detail & Related papers (2024-07-15T08:52:20Z)
Design Space Exploration of Low-Bit Quantized Neural Networks for Visual Place Recognition [26.213493552442102]
Visual Place Recognition (VPR) is a critical task for performing global re-localization in visual perception systems. Recently new works have focused on the recall@1 metric as a performance measure with limited focus on resource utilization. This has resulted in methods that use deep learning models too large to deploy on low powered edge devices. We study the impact of compact convolutional network architecture design in combination with full-precision and mixed-precision post-training quantization on VPR performance.
arXiv Detail & Related papers (2023-12-14T15:24:42Z)
Analysis of the Memorization and Generalization Capabilities of AI Agents: Are Continual Learners Robust? [91.682459306359]
In continual learning (CL), an AI agent learns from non-stationary data streams under dynamic environments. In this paper, a novel CL framework is proposed to achieve robust generalization to dynamic environments while retaining past knowledge. The generalization and memorization performance of the proposed framework are theoretically analyzed.
arXiv Detail & Related papers (2023-09-18T21:00:01Z)
Retrieval-Enhanced Contrastive Vision-Text Models [61.783728119255365]
We propose to equip vision-text models with the ability to refine their embedding with cross-modal retrieved information from a memory at inference time. Remarkably, we show that this can be done with a light-weight, single-layer, fusion transformer on top of a frozen CLIP. Our experiments validate that our retrieval-enhanced contrastive (RECO) training improves CLIP performance substantially on several challenging fine-grained tasks.
arXiv Detail & Related papers (2023-06-12T15:52:02Z)
SparCL: Sparse Continual Learning on the Edge [43.51885725281063]
We propose a novel framework called Sparse Continual Learning(SparCL) to enable cost-effective continual learning on edge devices. SparCL achieves both training acceleration and accuracy preservation through the synergy of three aspects: weight sparsity, data efficiency, and gradient sparsity.
arXiv Detail & Related papers (2022-09-20T05:24:48Z)
Learning towards Synchronous Network Memorizability and Generalizability for Continual Segmentation across Multiple Sites [52.84959869494459]
In clinical practice, a segmentation network is often required to continually learn on a sequential data stream from multiple sites. Existing methods are usually restricted in either network memorizability on previous sites or generalizability on unseen sites. This paper aims to tackle the problem of Synchronous Memorizability and Generalizability with a novel proposed SMG-learning framework.
arXiv Detail & Related papers (2022-06-14T13:04:36Z)
The CLEAR Benchmark: Continual LEArning on Real-World Imagery [77.98377088698984]
Continual learning (CL) is widely regarded as crucial challenge for lifelong AI. We introduce CLEAR, the first continual image classification benchmark dataset with a natural temporal evolution of visual concepts. We find that a simple unsupervised pre-training step can already boost state-of-the-art CL algorithms.
arXiv Detail & Related papers (2022-01-17T09:09:09Z)
Carousel Memory: Rethinking the Design of Episodic Memory for Continual Learning [19.260402028696916]
Continual Learning (CL) aims to learn from a continuous stream of tasks without forgetting knowledge learned from the previous tasks. Previous studies exploit episodic memory (EM), which stores a subset of the past observed samples while learning from new non-i.i.d. data. We propose to exploit the abundant storage to preserve past experiences and alleviate the forgetting by allowing CL to efficiently migrate samples between memory and storage.
arXiv Detail & Related papers (2021-10-14T11:27:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.