OODIn: An Optimised On-Device Inference Framework for Heterogeneous
Mobile Devices
- URL: http://arxiv.org/abs/2106.04723v1
- Date: Tue, 8 Jun 2021 22:38:18 GMT
- Title: OODIn: An Optimised On-Device Inference Framework for Heterogeneous
Mobile Devices
- Authors: Stylianos I. Venieris and Ioannis Panopoulos and Iakovos S. Venieris
- Abstract summary: OODIn is a framework for the optimised deployment of deep learning apps across heterogeneous mobile devices.
It counteracts the variability in device resources and DL models by means of a highly parametrised multi-layer design.
It delivers up to 4.3x and 3.5x performance gain over highly optimised platform- and model-aware designs.
- Score: 5.522962791793502
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Radical progress in the field of deep learning (DL) has led to unprecedented
accuracy in diverse inference tasks. As such, deploying DL models across mobile
platforms is vital to enable the development and broad availability of the
next-generation intelligent apps. Nevertheless, the wide and optimised
deployment of DL models is currently hindered by the vast system heterogeneity
of mobile devices, the varying computational cost of different DL models and
the variability of performance needs across DL applications. This paper
proposes OODIn, a framework for the optimised deployment of DL apps across
heterogeneous mobile devices. OODIn comprises a novel DL-specific software
architecture together with an analytical framework for modelling DL
applications that: (1) counteract the variability in device resources and DL
models by means of a highly parametrised multi-layer design; and (2) perform a
principled optimisation of both model- and system-level parameters through a
multi-objective formulation, designed for DL inference apps, in order to adapt
the deployment to the user-specified performance requirements and device
capabilities. Quantitative evaluation shows that the proposed framework
consistently outperforms status-quo designs across heterogeneous devices and
delivers up to 4.3x and 3.5x performance gain over highly optimised platform-
and model-aware designs respectively, while effectively adapting execution to
dynamic changes in resource availability.
Related papers
- Automatically Learning Hybrid Digital Twins of Dynamical Systems [56.69628749813084]
Digital Twins (DTs) simulate the states and temporal dynamics of real-world systems.
DTs often struggle to generalize to unseen conditions in data-scarce settings.
In this paper, we propose an evolutionary algorithm ($textbfHDTwinGen$) to autonomously propose, evaluate, and optimize HDTwins.
arXiv Detail & Related papers (2024-10-31T07:28:22Z) - Vehicle Suspension Recommendation System: Multi-Fidelity Neural Network-based Mechanism Design Optimization [4.038368925548051]
Vehicle suspensions are designed to improve driving performance and ride comfort, but different types are available depending on the environment.
Traditional design process is multi-step, gradually reducing the number of design candidates while performing costly analyses to meet target performance.
Recently, AI models have been used to reduce the computational cost of FEA.
arXiv Detail & Related papers (2024-10-03T23:54:03Z) - Structuring a Training Strategy to Robustify Perception Models with Realistic Image Augmentations [1.5723316845301678]
This report introduces a novel methodology for training with augmentations to enhance model robustness and performance in such conditions.
We present a comprehensive framework that includes identifying weak spots in Machine Learning models, selecting suitable augmentations, and devising effective training strategies.
Experimental results demonstrate improvements in model performance, as measured by commonly used metrics such as mean Average Precision (mAP) and mean Intersection over Union (mIoU) on open-source object detection and semantic segmentation models and datasets.
arXiv Detail & Related papers (2024-08-30T14:15:48Z) - Deep Learning Inference on Heterogeneous Mobile Processors: Potentials and Pitfalls [22.49750818224266]
A growing demand to deploy computation-intensive deep learning (DL) models on resource-constrained mobile devices for real-time intelligent applications.
Mobile devices hold potential to accelerate DL inference via parallel execution across heterogeneous processors.
This paper presents a holistic empirical study to assess the capabilities and challenges associated with parallel DL inference on heterogeneous mobile processors.
arXiv Detail & Related papers (2024-05-03T04:47:23Z) - SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation [61.392147185793476]
We present a unified and versatile foundation model, namely, SEED-X.
SEED-X is able to model multi-granularity visual semantics for comprehension and generation tasks.
We hope that our work will inspire future research into what can be achieved by versatile multimodal foundation models in real-world applications.
arXiv Detail & Related papers (2024-04-22T17:56:09Z) - CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion [58.15403987979496]
CREMA is a generalizable, highly efficient, and modular modality-fusion framework for video reasoning.
We propose a novel progressive multimodal fusion design supported by a lightweight fusion module and modality-sequential training strategy.
We validate our method on 7 video-language reasoning tasks assisted by diverse modalities, including VideoQA and Video-Audio/3D/Touch/Thermal QA.
arXiv Detail & Related papers (2024-02-08T18:27:22Z) - When Parameter-efficient Tuning Meets General-purpose Vision-language
Models [65.19127815275307]
PETAL revolutionizes the training process by requiring only 0.5% of the total parameters, achieved through a unique mode approximation technique.
Our experiments reveal that PETAL not only outperforms current state-of-the-art methods in most scenarios but also surpasses full fine-tuning models in effectiveness.
arXiv Detail & Related papers (2023-12-16T17:13:08Z) - Enabling Resource-efficient AIoT System with Cross-level Optimization: A
survey [20.360136850102833]
This survey aims to provide a broader optimization space for more free resource-performance tradeoffs.
By consolidating problems and techniques scattered over diverse levels, we aim to help readers understand their connections and stimulate further discussions.
arXiv Detail & Related papers (2023-09-27T08:04:24Z) - Energy-efficient Task Adaptation for NLP Edge Inference Leveraging
Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks.
We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z) - Optimization-Inspired Learning with Architecture Augmentations and
Control Mechanisms for Low-Level Vision [74.9260745577362]
This paper proposes a unified optimization-inspired learning framework to aggregate Generative, Discriminative, and Corrective (GDC) principles.
We construct three propagative modules to effectively solve the optimization models with flexible combinations.
Experiments across varied low-level vision tasks validate the efficacy and adaptability of GDC.
arXiv Detail & Related papers (2020-12-10T03:24:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.