Enhancing User Experience in On-Device Machine Learning with Gated Compression Layers
- URL: http://arxiv.org/abs/2405.01739v1
- Date: Thu, 2 May 2024 21:18:06 GMT
- Title: Enhancing User Experience in On-Device Machine Learning with Gated Compression Layers
- Authors: Haiguang Li, Usama Pervaiz, Joseph Antognini, MichaĆ Matuszak, Lawrence Au, Gilles Roux, Trausti Thormundsso,
- Abstract summary: On-device machine learning (ODML) enables powerful edge applications, but power consumption remains a key challenge for resource-constrained devices.
This work focuses on the use of Gated Compression (GC) layer to enhance ODML model performance while conserving power.
GC layers dynamically regulate data flow by selectively gating activations of neurons within the neural network and effectively filtering out non-essential inputs.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: On-device machine learning (ODML) enables powerful edge applications, but power consumption remains a key challenge for resource-constrained devices. To address this, developers often face a trade-off between model accuracy and power consumption, employing either computationally intensive models on high-power cores or pared-down models on low-power cores. Both approaches typically lead to a compromise in user experience (UX). This work focuses on the use of Gated Compression (GC) layer to enhance ODML model performance while conserving power and maximizing cost-efficiency, especially for always-on use cases. GC layers dynamically regulate data flow by selectively gating activations of neurons within the neural network and effectively filtering out non-essential inputs, which reduces power needs without compromising accuracy, and enables more efficient execution on heterogeneous compute cores. These improvements enhance UX through prolonged battery life, improved device responsiveness, and greater user comfort. In this work, we have integrated GC layers into vision and speech domain models including the transformer-based ViT model. Our experiments demonstrate theoretical power efficiency gains ranging from 158x to 30,000x for always-on scenarios. This substantial improvement empowers ODML applications with enhanced UX benefits.
Related papers
- Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency.
We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs)
We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z) - Dynamic Switch Layers For Unsupervised Learning [0.0]
On-device machine learning (ODML) enables intelligent applications on resource-constrained devices.
Power consumption poses a major challenge, forcing a trade-off between model accuracy and power efficiency.
We introduce the Dynamic Switch Layer ( DSL) to extend the benefits of GC layers to unsupervised learning scenarios.
arXiv Detail & Related papers (2024-04-05T21:03:11Z) - Towards Physical Plausibility in Neuroevolution Systems [0.276240219662896]
The increasing usage of Artificial Intelligence (AI) models, especially Deep Neural Networks (DNNs), is increasing the power consumption during training and inference.
This work addresses the growing energy consumption problem in Machine Learning (ML)
Even a slight reduction in power usage can lead to significant energy savings, benefiting users, companies, and the environment.
arXiv Detail & Related papers (2024-01-31T10:54:34Z) - Dynamic Early Exiting Predictive Coding Neural Networks [3.542013483233133]
With the urge for smaller and more accurate devices, Deep Learning models became too heavy to deploy.
We propose a shallow bidirectional network based on predictive coding theory and dynamic early exiting for halting further computations.
We achieve comparable accuracy to VGG-16 in image classification on CIFAR-10 with fewer parameters and less computational complexity.
arXiv Detail & Related papers (2023-09-05T08:00:01Z) - Energy-efficient Task Adaptation for NLP Edge Inference Leveraging
Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks.
We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z) - Gated Compression Layers for Efficient Always-On Models [1.5612040984769857]
We propose a novel Gated Compression layer that can be applied to transform existing neural network architectures into Gated Neural Networks.
We provide results across five public image and audio datasets that demonstrate the proposed Gated Compression layer effectively stops up to 96% of negative samples, compresses 97% of positive samples, while maintaining or improving model accuracy.
arXiv Detail & Related papers (2023-03-15T22:46:22Z) - Collaborative Intelligent Reflecting Surface Networks with Multi-Agent
Reinforcement Learning [63.83425382922157]
Intelligent reflecting surface (IRS) is envisioned to be widely applied in future wireless networks.
In this paper, we investigate a multi-user communication system assisted by cooperative IRS devices with the capability of energy harvesting.
arXiv Detail & Related papers (2022-03-26T20:37:14Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - AVAC: A Machine Learning based Adaptive RRAM Variability-Aware
Controller for Edge Devices [3.7346292069282643]
We propose an Adaptive RRAM Variability-Aware Controller, AVAC, which periodically updates Wait Buffer and batch sizes.
AVAC allows Edge devices to adapt to different applications and their stages, to improve performance and reduce energy consumption.
arXiv Detail & Related papers (2020-05-06T19:06:51Z) - Highly Efficient Salient Object Detection with 100K Parameters [137.74898755102387]
We propose a flexible convolutional module, namely generalized OctConv (gOctConv), to efficiently utilize both in-stage and cross-stages multi-scale features.
We build an extremely light-weighted model, namely CSNet, which achieves comparable performance with about 0.2% (100k) of large models on popular object detection benchmarks.
arXiv Detail & Related papers (2020-03-12T07:00:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.