Optimising Resource Management for Embedded Machine Learning
- URL: http://arxiv.org/abs/2105.03608v1
- Date: Sat, 8 May 2021 06:10:05 GMT
- Title: Optimising Resource Management for Embedded Machine Learning
- Authors: Lei Xun, Long Tran-Thanh, Bashir M Al-Hashimi, Geoff V. Merrett
- Abstract summary: Machine learning inference is increasingly being executed locally on mobile and embedded platforms.
We show approaches for online resource management in heterogeneous multi-core systems.
- Score: 23.00896228073755
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine learning inference is increasingly being executed locally on mobile
and embedded platforms, due to the clear advantages in latency, privacy and
connectivity. In this paper, we present approaches for online resource
management in heterogeneous multi-core systems and show how they can be applied
to optimise the performance of machine learning workloads. Performance can be
defined using platform-dependent (e.g. speed, energy) and platform-independent
(accuracy, confidence) metrics. In particular, we show how a Deep Neural
Network (DNN) can be dynamically scalable to trade-off these various
performance metrics. Achieving consistent performance when executing on
different platforms is necessary yet challenging, due to the different
resources provided and their capability, and their time-varying availability
when executing alongside other workloads. Managing the interface between
available hardware resources (often numerous and heterogeneous in nature),
software requirements, and user experience is increasingly complex.
Related papers
- Deep Learning Inference on Heterogeneous Mobile Processors: Potentials and Pitfalls [22.49750818224266]
A growing demand to deploy computation-intensive deep learning (DL) models on resource-constrained mobile devices for real-time intelligent applications.
Mobile devices hold potential to accelerate DL inference via parallel execution across heterogeneous processors.
This paper presents a holistic empirical study to assess the capabilities and challenges associated with parallel DL inference on heterogeneous mobile processors.
arXiv Detail & Related papers (2024-05-03T04:47:23Z) - Adaptive Resource Allocation for Virtualized Base Stations in O-RAN with
Online Learning [60.17407932691429]
Open Radio Access Network systems, with their base stations (vBSs), offer operators the benefits of increased flexibility, reduced costs, vendor diversity, and interoperability.
We propose an online learning algorithm that balances the effective throughput and vBS energy consumption, even under unforeseeable and "challenging'' environments.
We prove the proposed solutions achieve sub-linear regret, providing zero average optimality gap even in challenging environments.
arXiv Detail & Related papers (2023-09-04T17:30:21Z) - A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical
Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs)
MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z) - Learnability with Time-Sharing Computational Resource Concerns [65.268245109828]
We present a theoretical framework that takes into account the influence of computational resources in learning theory.
This framework can be naturally applied to stream learning where the incoming data streams can be potentially endless.
It may also provide a theoretical perspective for the design of intelligent supercomputing operating systems.
arXiv Detail & Related papers (2023-05-03T15:54:23Z) - Singularity: Planet-Scale, Preemptible, Elastic Scheduling of AI
Workloads [12.117736592836506]
We present Singularity, Microsoft's globally distributed scheduling service for deep learning training and inference workloads.
At the heart of Singularity is a novel, workload-aware scheduler that can transparently preempt and elastically scale deep learning workloads.
We show that the resulting efficiency and reliability gains with Singularity are achieved with negligible impact on the steady-state performance.
arXiv Detail & Related papers (2022-02-16T04:02:10Z) - SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines.
This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z) - SINGA-Easy: An Easy-to-Use Framework for MultiModal Analysis [18.084628500554462]
We introduce SINGA-Easy, a new deep learning framework that provides distributed hyper- parameter tuning at the training stage, dynamic computational cost control at the inference stage, and intuitive user interactions with multimedia contents facilitated by model explanation.
Our experiments on the training and deployment of multi-modality data analysis applications show that the framework is both usable and adaptable to dynamic inference loads.
arXiv Detail & Related papers (2021-08-03T08:39:54Z) - Multi-Exit Semantic Segmentation Networks [78.44441236864057]
We propose a framework for converting state-of-the-art segmentation models to MESS networks.
specially trained CNNs that employ parametrised early exits along their depth to save during inference on easier samples.
We co-optimise the number, placement and architecture of the attached segmentation heads, along with the exit policy, to adapt to the device capabilities and application-specific requirements.
arXiv Detail & Related papers (2021-06-07T11:37:03Z) - Clairvoyant Prefetching for Distributed Machine Learning I/O [9.490118207943192]
I/O is emerging as a major bottleneck for machine learning training, especially in distributed environments such as clouds and supercomputers.
We produce a novel machine learning I/O, HDMLP, to tackle the I/O bottleneck. HDMLP provides an easy-to-use, flexible, scalable solution that delivers better performance than state-of-the-art approaches.
arXiv Detail & Related papers (2021-01-21T17:21:42Z) - Toward Multiple Federated Learning Services Resource Sharing in Mobile
Edge Networks [88.15736037284408]
We study a new model of multiple federated learning services at the multi-access edge computing server.
We propose a joint resource optimization and hyper-learning rate control problem, namely MS-FEDL.
Our simulation results demonstrate the convergence performance of our proposed algorithms.
arXiv Detail & Related papers (2020-11-25T01:29:41Z) - AutoScale: Optimizing Energy Efficiency of End-to-End Edge Inference
under Stochastic Variance [11.093360539563657]
AutoScale is an adaptive and light-weight execution scaling engine built upon the custom-designed reinforcement learning algorithm.
This paper proposes AutoScale to enable accurate, energy-efficient deep learning inference at the edge.
arXiv Detail & Related papers (2020-05-06T00:30:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.