Related papers: OODIn: An Optimised On-Device Inference Framework for Heterogeneous Mobile Devices

OODIn: An Optimised On-Device Inference Framework for Heterogeneous Mobile Devices

URL: http://arxiv.org/abs/2106.04723v1
Date: Tue, 8 Jun 2021 22:38:18 GMT
Title: OODIn: An Optimised On-Device Inference Framework for Heterogeneous Mobile Devices
Authors: Stylianos I. Venieris and Ioannis Panopoulos and Iakovos S. Venieris
Abstract summary: OODIn is a framework for the optimised deployment of deep learning apps across heterogeneous mobile devices. It counteracts the variability in device resources and DL models by means of a highly parametrised multi-layer design. It delivers up to 4.3x and 3.5x performance gain over highly optimised platform- and model-aware designs.
Score: 5.522962791793502
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Radical progress in the field of deep learning (DL) has led to unprecedented accuracy in diverse inference tasks. As such, deploying DL models across mobile platforms is vital to enable the development and broad availability of the next-generation intelligent apps. Nevertheless, the wide and optimised deployment of DL models is currently hindered by the vast system heterogeneity of mobile devices, the varying computational cost of different DL models and the variability of performance needs across DL applications. This paper proposes OODIn, a framework for the optimised deployment of DL apps across heterogeneous mobile devices. OODIn comprises a novel DL-specific software architecture together with an analytical framework for modelling DL applications that: (1) counteract the variability in device resources and DL models by means of a highly parametrised multi-layer design; and (2) perform a principled optimisation of both model- and system-level parameters through a multi-objective formulation, designed for DL inference apps, in order to adapt the deployment to the user-specified performance requirements and device capabilities. Quantitative evaluation shows that the proposed framework consistently outperforms status-quo designs across heterogeneous devices and delivers up to 4.3x and 3.5x performance gain over highly optimised platform- and model-aware designs respectively, while effectively adapting execution to dynamic changes in resource availability.

Related papers

Efficient Multi-Instance Generation with Janus-Pro-Dirven Prompt Parsing [53.295515505026096]
Janus-Pro-driven Prompt Parsing is a prompt- parsing module that bridges text understanding and layout generation. MIGLoRA is a parameter-efficient plug-in integrating Low-Rank Adaptation into UNet (SD1.5) and DiT (SD3) backbones. The proposed method achieves state-of-the-art performance on COCO and LVIS benchmarks while maintaining parameter efficiency.
arXiv Detail & Related papers (2025-03-27T00:59:14Z)
CrowdHMTware: A Cross-level Co-adaptation Middleware for Context-aware Mobile DL Deployment [19.229115339238803]
CrowdHMTware is a context-adaptive deep learning (DL) model deployment for heterogeneous mobile devices. It establishes an automated adaptation loop between cross-level functional components, i.e. elastic inference, scalable offloading, and model-adaptive engine. It can effectively scale DL model, offloading, and engine actions across diverse platforms and tasks.
arXiv Detail & Related papers (2025-03-06T07:52:20Z)
Optimize Incompatible Parameters through Compatibility-aware Knowledge Integration [104.52015641099828]
Existing research excels in removing such parameters or merging the outputs of multiple different pretrained models. We propose Compatibility-aware Knowledge Integration (CKI), which consists of Deep Assessment and Deep Splicing. The integrated model can be used directly for inference or for further fine-tuning.
arXiv Detail & Related papers (2025-01-10T01:42:43Z)
AmoebaLLM: Constructing Any-Shape Large Language Models for Efficient and Instant Deployment [13.977849745488339]
AmoebaLLM is a novel framework designed to enable the instant derivation of large language models of arbitrary shapes. AmoebaLLM significantly facilitates rapid deployment tailored to various platforms and applications.
arXiv Detail & Related papers (2024-11-15T22:02:28Z)
Automatically Learning Hybrid Digital Twins of Dynamical Systems [56.69628749813084]
Digital Twins (DTs) simulate the states and temporal dynamics of real-world systems. DTs often struggle to generalize to unseen conditions in data-scarce settings. In this paper, we propose an evolutionary algorithm ($textbfHDTwinGen$) to autonomously propose, evaluate, and optimize HDTwins.
arXiv Detail & Related papers (2024-10-31T07:28:22Z)
Vehicle Suspension Recommendation System: Multi-Fidelity Neural Network-based Mechanism Design Optimization [4.038368925548051]
Vehicle suspensions are designed to improve driving performance and ride comfort, but different types are available depending on the environment. Traditional design process is multi-step, gradually reducing the number of design candidates while performing costly analyses to meet target performance. Recently, AI models have been used to reduce the computational cost of FEA.
arXiv Detail & Related papers (2024-10-03T23:54:03Z)
Deep Learning Inference on Heterogeneous Mobile Processors: Potentials and Pitfalls [22.49750818224266]
A growing demand to deploy computation-intensive deep learning (DL) models on resource-constrained mobile devices for real-time intelligent applications. Mobile devices hold potential to accelerate DL inference via parallel execution across heterogeneous processors. This paper presents a holistic empirical study to assess the capabilities and challenges associated with parallel DL inference on heterogeneous mobile processors.
arXiv Detail & Related papers (2024-05-03T04:47:23Z)
SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation [61.392147185793476]
We present a unified and versatile foundation model, namely, SEED-X. SEED-X is able to model multi-granularity visual semantics for comprehension and generation tasks. We hope that our work will inspire future research into what can be achieved by versatile multimodal foundation models in real-world applications.
arXiv Detail & Related papers (2024-04-22T17:56:09Z)
CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion [58.15403987979496]
CREMA is a generalizable, highly efficient, and modular modality-fusion framework for video reasoning. We propose a novel progressive multimodal fusion design supported by a lightweight fusion module and modality-sequential training strategy. We validate our method on 7 video-language reasoning tasks assisted by diverse modalities, including VideoQA and Video-Audio/3D/Touch/Thermal QA.
arXiv Detail & Related papers (2024-02-08T18:27:22Z)
When Parameter-efficient Tuning Meets General-purpose Vision-language Models [65.19127815275307]
PETAL revolutionizes the training process by requiring only 0.5% of the total parameters, achieved through a unique mode approximation technique. Our experiments reveal that PETAL not only outperforms current state-of-the-art methods in most scenarios but also surpasses full fine-tuning models in effectiveness.
arXiv Detail & Related papers (2023-12-16T17:13:08Z)
Enabling Resource-efficient AIoT System with Cross-level Optimization: A survey [20.360136850102833]
This survey aims to provide a broader optimization space for more free resource-performance tradeoffs. By consolidating problems and techniques scattered over diverse levels, we aim to help readers understand their connections and stimulate further discussions.
arXiv Detail & Related papers (2023-09-27T08:04:24Z)
Energy-efficient Task Adaptation for NLP Edge Inference Leveraging Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks. We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z)
Optimization-Inspired Learning with Architecture Augmentations and Control Mechanisms for Low-Level Vision [74.9260745577362]
This paper proposes a unified optimization-inspired learning framework to aggregate Generative, Discriminative, and Corrective (GDC) principles. We construct three propagative modules to effectively solve the optimization models with flexible combinations. Experiments across varied low-level vision tasks validate the efficacy and adaptability of GDC.
arXiv Detail & Related papers (2020-12-10T03:24:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.