Related papers: Enhancing User Experience in On-Device Machine Learning with Gated Compression Layers

Enhancing User Experience in On-Device Machine Learning with Gated Compression Layers

URL: http://arxiv.org/abs/2405.01739v1
Date: Thu, 2 May 2024 21:18:06 GMT
Title: Enhancing User Experience in On-Device Machine Learning with Gated Compression Layers
Authors: Haiguang Li, Usama Pervaiz, Joseph Antognini, Michał Matuszak, Lawrence Au, Gilles Roux, Trausti Thormundsso,
Abstract summary: On-device machine learning (ODML) enables powerful edge applications, but power consumption remains a key challenge for resource-constrained devices. This work focuses on the use of Gated Compression (GC) layer to enhance ODML model performance while conserving power. GC layers dynamically regulate data flow by selectively gating activations of neurons within the neural network and effectively filtering out non-essential inputs.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: On-device machine learning (ODML) enables powerful edge applications, but power consumption remains a key challenge for resource-constrained devices. To address this, developers often face a trade-off between model accuracy and power consumption, employing either computationally intensive models on high-power cores or pared-down models on low-power cores. Both approaches typically lead to a compromise in user experience (UX). This work focuses on the use of Gated Compression (GC) layer to enhance ODML model performance while conserving power and maximizing cost-efficiency, especially for always-on use cases. GC layers dynamically regulate data flow by selectively gating activations of neurons within the neural network and effectively filtering out non-essential inputs, which reduces power needs without compromising accuracy, and enables more efficient execution on heterogeneous compute cores. These improvements enhance UX through prolonged battery life, improved device responsiveness, and greater user comfort. In this work, we have integrated GC layers into vision and speech domain models including the transformer-based ViT model. Our experiments demonstrate theoretical power efficiency gains ranging from 158x to 30,000x for always-on scenarios. This substantial improvement empowers ODML applications with enhanced UX benefits.

Related papers

Energy-Efficient Federated Learning for Edge Real-Time Vision via Joint Data, Computation, and Communication Design [43.89869891417806]
Real-time computer vision (CV) applications on wireless edge devices demand energy-efficient and privacy-preserving learning.<n>We propose FedDPQ, an ultra energy-efficient FL framework for real-time CV over unreliable wireless networks.
arXiv Detail & Related papers (2025-08-03T13:05:11Z)
THOR: A Generic Energy Estimation Approach for On-Device Training [34.57867978862375]
THOR is a generic approach for energy consumption estimation in deep neural network (DNN) training. We conduct extensive experiments with various types of models across different real-world platforms. The results demonstrate that THOR has effectively reduced the Mean Absolute Percentage Error (MAPE) by up to 30%.
arXiv Detail & Related papers (2025-01-27T03:29:02Z)
USEFUSE: Utile Stride for Enhanced Performance in Fused Layer Architecture of Deep Neural Networks [0.6435156676256051]
This study presents the Sum-of-Products (SOP) units for convolution, which utilize low-latency left-to-right bit-serial arithmetic. An effective mechanism detects and skips inefficient convolutions after ReLU layers, minimizing power consumption. Two designs cater to varied demands: one focuses on minimal response time for mission-critical applications, and another focuses on resource-constrained devices with comparable latency.
arXiv Detail & Related papers (2024-12-18T11:04:58Z)
Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge. Existing methods struggle to balance high model performance with low resource consumption. We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z)
FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency. We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs) We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z)
Efficient Heterogeneous Large Language Model Decoding with Model-Attention Disaggregation [15.35494431928751]
Transformer-based large language models (LLMs) exhibit impressive performance in generative tasks but also introduce significant challenges in real-world serving.<n>We introduce model-attention disaggregation to enhance the efficiency of LLM decoding.<n>We develop and deploy Lamina, an LLM inference system that incorporates model-attention disaggregation in a distributed heterogeneous cluster.
arXiv Detail & Related papers (2024-05-03T02:15:15Z)
Dynamic Switch Layers For Unsupervised Learning [0.0]
On-device machine learning (ODML) enables intelligent applications on resource-constrained devices. Power consumption poses a major challenge, forcing a trade-off between model accuracy and power efficiency. We introduce the Dynamic Switch Layer ( DSL) to extend the benefits of GC layers to unsupervised learning scenarios.
arXiv Detail & Related papers (2024-04-05T21:03:11Z)
Towards Physical Plausibility in Neuroevolution Systems [0.276240219662896]
The increasing usage of Artificial Intelligence (AI) models, especially Deep Neural Networks (DNNs), is increasing the power consumption during training and inference. This work addresses the growing energy consumption problem in Machine Learning (ML) Even a slight reduction in power usage can lead to significant energy savings, benefiting users, companies, and the environment.
arXiv Detail & Related papers (2024-01-31T10:54:34Z)
Dynamic Early Exiting Predictive Coding Neural Networks [3.542013483233133]
With the urge for smaller and more accurate devices, Deep Learning models became too heavy to deploy. We propose a shallow bidirectional network based on predictive coding theory and dynamic early exiting for halting further computations. We achieve comparable accuracy to VGG-16 in image classification on CIFAR-10 with fewer parameters and less computational complexity.
arXiv Detail & Related papers (2023-09-05T08:00:01Z)
Energy-efficient Task Adaptation for NLP Edge Inference Leveraging Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks. We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z)
Gated Compression Layers for Efficient Always-On Models [1.5612040984769857]
We propose a novel Gated Compression layer that can be applied to transform existing neural network architectures into Gated Neural Networks. We provide results across five public image and audio datasets that demonstrate the proposed Gated Compression layer effectively stops up to 96% of negative samples, compresses 97% of positive samples, while maintaining or improving model accuracy.
arXiv Detail & Related papers (2023-03-15T22:46:22Z)
Collaborative Intelligent Reflecting Surface Networks with Multi-Agent Reinforcement Learning [63.83425382922157]
Intelligent reflecting surface (IRS) is envisioned to be widely applied in future wireless networks. In this paper, we investigate a multi-user communication system assisted by cooperative IRS devices with the capability of energy harvesting.
arXiv Detail & Related papers (2022-03-26T20:37:14Z)
An Adaptive Device-Edge Co-Inference Framework Based on Soft Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices. We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations. Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z)
AVAC: A Machine Learning based Adaptive RRAM Variability-Aware Controller for Edge Devices [3.7346292069282643]
We propose an Adaptive RRAM Variability-Aware Controller, AVAC, which periodically updates Wait Buffer and batch sizes. AVAC allows Edge devices to adapt to different applications and their stages, to improve performance and reduce energy consumption.
arXiv Detail & Related papers (2020-05-06T19:06:51Z)
Highly Efficient Salient Object Detection with 100K Parameters [137.74898755102387]
We propose a flexible convolutional module, namely generalized OctConv (gOctConv), to efficiently utilize both in-stage and cross-stages multi-scale features. We build an extremely light-weighted model, namely CSNet, which achieves comparable performance with about 0.2% (100k) of large models on popular object detection benchmarks.
arXiv Detail & Related papers (2020-03-12T07:00:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.