Related papers: etuner: A Redundancy-Aware Framework for Efficient Continual Learning Application on Edge Devices

etuner: A Redundancy-Aware Framework for Efficient Continual Learning Application on Edge Devices

URL: http://arxiv.org/abs/2401.16694v5
Date: Thu, 22 Aug 2024 19:46:37 GMT
Title: etuner: A Redundancy-Aware Framework for Efficient Continual Learning Application on Edge Devices
Authors: Sheng Li, Geng Yuan, Yawen Wu, Yue Dai, Tianyu Wang, Chao Wu, Alex K. Jones, Jingtong Hu, Yanzhi Wang, Xulong Tang,
Abstract summary: We propose ETuner, an efficient edge continual learning framework that optimize inference accuracy, fine-tuning execution time, and energy efficiency. Experimental results show that, on average, ETuner reduces overall fine-tuning execution time by 64%, energy consumption by 56%, and improves average inference accuracy by 1.75% over the immediate model fine-tuning approach.
Score: 47.365775210055396
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Many emerging applications, such as robot-assisted eldercare and object recognition, generally employ deep learning neural networks (DNNs) and require the deployment of DNN models on edge devices. These applications naturally require i) handling streaming-in inference requests and ii) fine-tuning the deployed models to adapt to possible deployment scenario changes. Continual learning (CL) is widely adopted to satisfy these needs. CL is a popular deep learning paradigm that handles both continuous model fine-tuning and overtime inference requests. However, an inappropriate model fine-tuning scheme could involve significant redundancy and consume considerable time and energy, making it challenging to apply CL on edge devices. In this paper, we propose ETuner, an efficient edge continual learning framework that optimizes inference accuracy, fine-tuning execution time, and energy efficiency through both inter-tuning and intra-tuning optimizations. Experimental results show that, on average, ETuner reduces overall fine-tuning execution time by 64%, energy consumption by 56%, and improves average inference accuracy by 1.75% over the immediate model fine-tuning approach.

Related papers

Accelerating Linear Recurrent Neural Networks for the Edge with Unstructured Sparsity [39.483346492111515]
Linear recurrent neural networks enable powerful long-range sequence modeling with constant memory usage and time-per-token during inference. Unstructured sparsity offers a compelling solution, enabling substantial reductions in compute and memory requirements when accelerated by compatible hardware platforms. We find that highly sparse linear RNNs consistently achieve better efficiency-performance trade-offs than dense baselines.
arXiv Detail & Related papers (2025-02-03T13:09:21Z)
The Efficiency vs. Accuracy Trade-off: Optimizing RAG-Enhanced LLM Recommender Systems Using Multi-Head Early Exit [46.37267466656765]
This paper presents an optimization framework that combines Retrieval-Augmented Generation (RAG) with an innovative multi-head early exit architecture. Our experiments demonstrate how this architecture effectively decreases time without sacrificing the accuracy needed for reliable recommendation delivery.
arXiv Detail & Related papers (2025-01-04T03:26:46Z)
EdgeRL: Reinforcement Learning-driven Deep Learning Model Inference Optimization at Edge [2.8946323553477704]
We propose EdgeRL framework that seeks to strike balance by using an Advantage Actor-Critic (A2C) Reinforcement Learning (RL) approach. We evaluate the benefits of EdgeRL framework in terms of end device energy savings, inference accuracy improvement, and end-to-end inference latency reduction.
arXiv Detail & Related papers (2024-10-16T04:31:39Z)
Neural Horizon Model Predictive Control -- Increasing Computational Efficiency with Neural Networks [0.0]
We propose a proposed machine-learning supported approach to model predictive control. We propose approximating part of the problem horizon, while maintaining safety guarantees. The proposed MPC scheme can be applied to a wide range of applications, including those requiring a rapid control response.
arXiv Detail & Related papers (2024-08-19T08:13:37Z)
Auto-Train-Once: Controller Network Guided Automatic Network Pruning from Scratch [72.26822499434446]
Auto-Train-Once (ATO) is an innovative network pruning algorithm designed to automatically reduce the computational and storage costs of DNNs. We provide a comprehensive convergence analysis as well as extensive experiments, and the results show that our approach achieves state-of-the-art performance across various model architectures.
arXiv Detail & Related papers (2024-03-21T02:33:37Z)
Efficient Post-Training Augmentation for Adaptive Inference in Heterogeneous and Distributed IoT Environments [4.343246899774834]
Early Exit Neural Networks (EENNs) present a solution to enhance the efficiency of neural network deployments. We propose an automated augmentation flow that focuses on converting an existing model into an EENN. Our framework constructs the EENN architecture, maps its subgraphs to the hardware targets, and configures its decision mechanism.
arXiv Detail & Related papers (2024-03-12T08:27:53Z)
Fractional Deep Reinforcement Learning for Age-Minimal Mobile Edge Computing [11.403989519949173]
This work focuses on the timeliness of computational-intensive updates, measured by Age-ofInformation (AoI) We study how to jointly optimize the task updating and offloading policies for AoI with fractional form. Experimental results show that our proposed algorithms reduce the average AoI by up to 57.6% compared with several non-fractional benchmarks.
arXiv Detail & Related papers (2023-12-16T11:13:40Z)
Federated Learning of Large Language Models with Parameter-Efficient Prompt Tuning and Adaptive Optimization [71.87335804334616]
Federated learning (FL) is a promising paradigm to enable collaborative model training with decentralized data. The training process of Large Language Models (LLMs) generally incurs the update of significant parameters. This paper proposes an efficient partial prompt tuning approach to improve performance and efficiency simultaneously.
arXiv Detail & Related papers (2023-10-23T16:37:59Z)
NASOA: Towards Faster Task-oriented Online Fine-tuning with a Zoo of Models [90.6485663020735]
Fine-tuning from pre-trained ImageNet models has been a simple, effective, and popular approach for various computer vision tasks. We propose a joint Neural Architecture Search and Online Adaption framework named NASOA towards a faster task-oriented fine-tuning.
arXiv Detail & Related papers (2021-08-07T12:03:14Z)
Active Learning for Deep Neural Networks on Edge Devices [0.0]
This paper formalizes a practical active learning problem for neural networks on edge devices. We propose a general task-agnostic framework to tackle this problem, which reduces it to a stream submodular property. We evaluate our approach on both classification and object detection tasks in a practical setting to simulate a real-life scenario.
arXiv Detail & Related papers (2021-06-21T03:55:33Z)
Multi-Exit Semantic Segmentation Networks [78.44441236864057]
We propose a framework for converting state-of-the-art segmentation models to MESS networks. specially trained CNNs that employ parametrised early exits along their depth to save during inference on easier samples. We co-optimise the number, placement and architecture of the attached segmentation heads, along with the exit policy, to adapt to the device capabilities and application-specific requirements.
arXiv Detail & Related papers (2021-06-07T11:37:03Z)
Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose. We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.