Related papers: Mobiprox: Supporting Dynamic Approximate Computing on Mobiles

Mobiprox: Supporting Dynamic Approximate Computing on Mobiles

URL: http://arxiv.org/abs/2303.11291v2
Date: Thu, 22 Feb 2024 16:48:50 GMT
Title: Mobiprox: Supporting Dynamic Approximate Computing on Mobiles
Authors: Matev\v{z} Fabjan\v{c}i\v{c}, Octavian Machidon, Hashim Sharif, Yifan Zhao, Sa\v{s}a Misailovi\'c, Veljko Pejovi\'c
Abstract summary: We present Mobiprox, a framework enabling mobile deep learning with flexible precision. Mobiprox implements tunable approximations of tensor operations and enables runtime-adaptable approximation of individual network layers. We demonstrate that it can save up to 15% system-wide energy with a minimal impact on the inference accuracy.
Score: 9.012472705158592
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Runtime-tunable context-dependent network compression would make mobile deep learning (DL) adaptable to often varying resource availability, input "difficulty", or user needs. The existing compression techniques significantly reduce the memory, processing, and energy tax of DL, yet, the resulting models tend to be permanently impaired, sacrificing the inference power for reduced resource usage. The existing tunable compression approaches, on the other hand, require expensive re-training, do not support arbitrary strategies for adapting the compression and do not provide mobile-ready implementations. In this paper we present Mobiprox, a framework enabling mobile DL with flexible precision. Mobiprox implements tunable approximations of tensor operations and enables runtime-adaptable approximation of individual network layers. A profiler and a tuner included with Mobiprox identify the most promising neural network approximation configurations leading to the desired inference quality with the minimal use of resources. Furthermore, we develop control strategies that depending on contextual factors, such as the input data difficulty, dynamically adjust the approximation levels across a mobile DL model's layers. We implement Mobiprox in Android OS and through experiments in diverse mobile domains, including human activity recognition and spoken keyword detection, demonstrate that it can save up to 15% system-wide energy with a minimal impact on the inference accuracy.

Related papers

A Foundation Model for Massive MIMO Precoding with an Adaptive per-User Rate-Power Tradeoff [4.8310710966636545]
We propose a transformer-based foundation model for mMIMO precoding that seeks to minimize the energy consumption of the transmitter while dynamically adapting to per-user rate requirements.<n>At equal energy consumption, zero-shot deployment of the proposed foundation model significantly outperforms zero forcing, and approaches weighted minimum mean squared error performance with 8x less complexity.<n>Our work enables the implementation of DL-based solutions in practice by addressing challenges of data availability and training complexity.
arXiv Detail & Related papers (2025-07-24T17:10:06Z)
Optimizing Communication and Device Clustering for Clustered Federated Learning with Differential Privacy [28.120922916868683]
We propose a secure and communication-efficient clustered federated learning (CFL) design.<n>In our model, several base stations (BSs) with heterogeneous task-handling capabilities and multiple users with non-independent and identically distributed (non-IID) data jointly perform CFL training.<n>We propose a novel dynamic penalty function assisted value multi-agent reinforcement learning (DPVD-MARL) algorithm that enables distributed BSs to independently determine their connected users.
arXiv Detail & Related papers (2025-07-09T22:44:26Z)
AdaScale: Dynamic Context-aware DNN Scaling via Automated Adaptation Loop on Mobile Devices [16.5444553304756]
We introduce AdaScale, an elastic inference framework that automates the adaptation of deep models to dynamic contexts. AdaScale significantly enhances accuracy by 5.09%, reduces training overhead by 66.89%, speeds up inference latency by 1.51 to 6.2 times, and lowers energy costs by 4.69 times.
arXiv Detail & Related papers (2024-12-01T08:33:56Z)
AdaFlow: Opportunistic Inference on Asynchronous Mobile Data with Generalized Affinity Control [16.944584145880793]
AdaFlow pioneers the formulation of structured cross-modality affinity in mobile contexts using a hierarchical analysis-based normalized matrix. AdaFlow significantly reduces inference latency by up to 79.9% and enhances accuracy by up to 61.9%, outperforming status quo approaches.
arXiv Detail & Related papers (2024-10-31T15:28:22Z)
A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs) MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z)
Slimmable Encoders for Flexible Split DNNs in Bandwidth and Resource Constrained IoT Systems [12.427821850039448]
We propose a novel split computing approach based on slimmable ensemble encoders. The key advantage of our design is the ability to adapt computational load and transmitted data size in real-time with minimal overhead and time. Our model outperforms existing solutions in terms of compression efficacy and execution time, especially in the context of weak mobile devices.
arXiv Detail & Related papers (2023-06-22T06:33:12Z)
FrankenSplit: Efficient Neural Feature Compression with Shallow Variational Bottleneck Injection for Mobile Edge Computing [5.815300670677979]
We introduce a novel framework for resource-conscious compression models and extensively evaluate our method in an asymmetric environment. Our method achieves 60% lower than a state-of-the-art SC method without decreasing accuracy and is up 16x faster than offloading with existing standards.
arXiv Detail & Related papers (2023-02-21T14:03:22Z)
Artificial Intelligence Empowered Multiple Access for Ultra Reliable and Low Latency THz Wireless Networks [76.89730672544216]
Terahertz (THz) wireless networks are expected to catalyze the beyond fifth generation (B5G) era. To satisfy the ultra-reliability and low-latency demands of several B5G applications, novel mobility management approaches are required. This article presents a holistic MAC layer approach that enables intelligent user association and resource allocation, as well as flexible and adaptive mobility management.
arXiv Detail & Related papers (2022-08-17T03:00:24Z)
An Adaptive Device-Edge Co-Inference Framework Based on Soft Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices. We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations. Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z)
LCS: Learning Compressible Subspaces for Adaptive Network Compression at Inference Time [57.52251547365967]
We propose a method for training a "compressible subspace" of neural networks that contains a fine-grained spectrum of models. We present results for achieving arbitrarily fine-grained accuracy-efficiency trade-offs at inference time for structured and unstructured sparsity. Our algorithm extends to quantization at variable bit widths, achieving accuracy on par with individually trained networks.
arXiv Detail & Related papers (2021-10-08T17:03:34Z)
Fed-LAMB: Layerwise and Dimensionwise Locally Adaptive Optimization Algorithm [24.42828071396353]
In the emerging paradigm of federated learning (FL), large amount of clients, such as mobile devices, are used to train on their respective data. Due to the low bandwidth, decentralized optimization methods need to shift the computation burden from those clients to those servers. We present Fed-LAMB, a novel learning method based on a layerwise, deep neural networks.
arXiv Detail & Related papers (2021-10-01T16:54:31Z)
Remote Multilinear Compressive Learning with Adaptive Compression [107.87219371697063]
MultiIoT Compressive Learning (MCL) is an efficient signal acquisition and learning paradigm for multidimensional signals. We propose a novel optimization scheme that enables such a feature for MCL models.
arXiv Detail & Related papers (2021-09-02T19:24:03Z)
Belief Propagation Reloaded: Learning BP-Layers for Labeling Problems [83.98774574197613]
We take one of the simplest inference methods, a truncated max-product Belief propagation, and add what is necessary to make it a proper component of a deep learning model. This BP-Layer can be used as the final or an intermediate block in convolutional neural networks (CNNs) The model is applicable to a range of dense prediction problems, is well-trainable and provides parameter-efficient and robust solutions in stereo, optical flow and semantic segmentation.
arXiv Detail & Related papers (2020-03-13T13:11:35Z)
MDLdroid: a ChainSGD-reduce Approach to Mobile Deep Learning for Personal Mobile Sensing [14.574274428615666]
Running deep learning on devices offers several advantages including data privacy preservation and low-latency response for both model robustness and update. Personal mobile sensing applications are mostly user-specific and highly affected by environment. We present MDLdroid, a novel decentralized mobile deep learning framework to enable resource-aware on-device collaborative learning.
arXiv Detail & Related papers (2020-02-07T16:55:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.