Deep Learning Inference on Heterogeneous Mobile Processors: Potentials and Pitfalls
- URL: http://arxiv.org/abs/2405.01851v1
- Date: Fri, 3 May 2024 04:47:23 GMT
- Title: Deep Learning Inference on Heterogeneous Mobile Processors: Potentials and Pitfalls
- Authors: Sicong Liu, Wentao Zhou, Zimu Zhou, Bin Guo, Minfan Wang, Cheng Fang, Zheng Lin, Zhiwen Yu,
- Abstract summary: A growing demand to deploy computation-intensive deep learning (DL) models on resource-constrained mobile devices for real-time intelligent applications.
Mobile devices hold potential to accelerate DL inference via parallel execution across heterogeneous processors.
This paper presents a holistic empirical study to assess the capabilities and challenges associated with parallel DL inference on heterogeneous mobile processors.
- Score: 22.49750818224266
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There is a growing demand to deploy computation-intensive deep learning (DL) models on resource-constrained mobile devices for real-time intelligent applications. Equipped with a variety of processing units such as CPUs, GPUs, and NPUs, the mobile devices hold potential to accelerate DL inference via parallel execution across heterogeneous processors. Various efficient parallel methods have been explored to optimize computation distribution, achieve load balance, and minimize communication cost across processors. Yet their practical effectiveness in the dynamic and diverse real-world mobile environment is less explored. This paper presents a holistic empirical study to assess the capabilities and challenges associated with parallel DL inference on heterogeneous mobile processors. Through carefully designed experiments covering various DL models, mobile software/hardware environments, workload patterns, and resource availability, we identify limitations of existing techniques and highlight opportunities for cross-level optimization.
Related papers
- Contemporary Model Compression on Large Language Models Inference [7.307436175842646]
Large Language Models (LLMs) have revolutionized natural language processing by achieving state-of-the-art results across a variety of tasks.
The computational demands of LLM inference, including high memory consumption and slow processing speeds, pose significant challenges for real-world applications.
This survey explores techniques in model compression that address these challenges by reducing the size and computational requirements of LLMs.
arXiv Detail & Related papers (2024-09-03T15:35:01Z) - On-Device Language Models: A Comprehensive Review [26.759861320845467]
Review examines the challenges of deploying computationally expensive large language models on resource-constrained devices.
Paper investigates on-device language models, their efficient architectures, as well as state-of-the-art compression techniques.
Case studies of on-device language models from major mobile manufacturers demonstrate real-world applications and potential benefits.
arXiv Detail & Related papers (2024-08-26T03:33:36Z) - Machine Learning Insides OptVerse AI Solver: Design Principles and
Applications [74.67495900436728]
We present a comprehensive study on the integration of machine learning (ML) techniques into Huawei Cloud's OptVerse AI solver.
We showcase our methods for generating complex SAT and MILP instances utilizing generative models that mirror multifaceted structures of real-world problem.
We detail the incorporation of state-of-the-art parameter tuning algorithms which markedly elevate solver performance.
arXiv Detail & Related papers (2024-01-11T15:02:15Z) - Enabling Resource-efficient AIoT System with Cross-level Optimization: A
survey [20.360136850102833]
This survey aims to provide a broader optimization space for more free resource-performance tradeoffs.
By consolidating problems and techniques scattered over diverse levels, we aim to help readers understand their connections and stimulate further discussions.
arXiv Detail & Related papers (2023-09-27T08:04:24Z) - A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical
Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs)
MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z) - Energy-efficient Task Adaptation for NLP Edge Inference Leveraging
Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks.
We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z) - SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines.
This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z) - Energy-Efficient Multi-Orchestrator Mobile Edge Learning [54.28419430315478]
Mobile Edge Learning (MEL) is a collaborative learning paradigm that features distributed training of Machine Learning (ML) models over edge devices.
In MEL, possible coexistence of multiple learning tasks with different datasets may arise.
We propose lightweight algorithms that can achieve near-optimal performance and facilitate the trade-offs between energy consumption, accuracy, and solution complexity.
arXiv Detail & Related papers (2021-09-02T07:37:10Z) - Reconfigurable Intelligent Surface Assisted Mobile Edge Computing with
Heterogeneous Learning Tasks [53.1636151439562]
Mobile edge computing (MEC) provides a natural platform for AI applications.
We present an infrastructure to perform machine learning tasks at an MEC with the assistance of a reconfigurable intelligent surface (RIS)
Specifically, we minimize the learning error of all participating users by jointly optimizing transmit power of mobile users, beamforming vectors of the base station, and the phase-shift matrix of the RIS.
arXiv Detail & Related papers (2020-12-25T07:08:50Z) - Scalable Deep-Learning-Accelerated Topology Optimization for Additively
Manufactured Materials [4.221095652322005]
Topology optimization (TO) is a popular and powerful computational approach for designing novel structures, materials, and devices.
To address these issues, we propose a general scalable deep-learning (DL) based TO framework, referred to as SDL-TO.
Our framework accelerates TO by learning the iterative history data and simultaneously training on the mapping between the given design and its gradient.
arXiv Detail & Related papers (2020-11-28T17:38:31Z) - Learning Centric Wireless Resource Allocation for Edge Computing:
Algorithm and Experiment [15.577056429740951]
Edge intelligence is an emerging network architecture that integrates sensing, communication, computing components, and supports various machine learning applications.
Existing methods ignore two important facts: 1) different models have heterogeneous demands on training data; 2) there is a mismatch between the simulated environment and the real-world environment.
This paper proposes the learning centric wireless resource allocation scheme that maximizes the worst learning performance of multiple tasks.
arXiv Detail & Related papers (2020-10-29T06:20:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.