Related papers: Shared Mobile-Cloud Inference for Collaborative Intelligence

Shared Mobile-Cloud Inference for Collaborative Intelligence

URL: http://arxiv.org/abs/2002.00157v1
Date: Sat, 1 Feb 2020 07:12:01 GMT
Title: Shared Mobile-Cloud Inference for Collaborative Intelligence
Authors: Mateen Ulhaq and Ivan V. Baji\'c
Abstract summary: We present a shared mobile-cloud inference approach for neural model inference. The strategy can improve inference latency, energy consumption, and network bandwidth usage. Further performance gain can be achieved by compressing the feature tensor before its transmission.
Score: 35.103437828235826
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As AI applications for mobile devices become more prevalent, there is an increasing need for faster execution and lower energy consumption for neural model inference. Historically, the models run on mobile devices have been smaller and simpler in comparison to large state-of-the-art research models, which can only run on the cloud. However, cloud-only inference has drawbacks such as increased network bandwidth consumption and higher latency. In addition, cloud-only inference requires the input data (images, audio) to be fully transferred to the cloud, creating concerns about potential privacy breaches. We demonstrate an alternative approach: shared mobile-cloud inference. Partial inference is performed on the mobile in order to reduce the dimensionality of the input data and arrive at a compact feature tensor, which is a latent space representation of the input signal. The feature tensor is then transmitted to the server for further inference. This strategy can improve inference latency, energy consumption, and network bandwidth usage, as well as provide privacy protection, because the original signal never leaves the mobile. Further performance gain can be achieved by compressing the feature tensor before its transmission.

Related papers

Knowledge boosting during low-latency inference [20.617827647115874]
Models for low-latency, streaming applications could benefit from the knowledge capacity of larger models, but edge devices cannot run these models due to resource constraints. We propose knowledge boosting, a novel technique that allows a large model to operate on time-delayed input during inference, while still boosting small model performance. Our results show larger gains where the performance gap between the small and large models is wide, demonstrating a promising method for large-small model collaboration for low-latency applications.
arXiv Detail & Related papers (2024-07-09T22:04:23Z)
Combining Cloud and Mobile Computing for Machine Learning [2.595189746033637]
We consider model segmentation as a solution to improving the user experience. We show that the division not only reduces the wait time for users but can also be fine-tuned to optimize the workloads of the cloud.
arXiv Detail & Related papers (2024-01-20T06:14:22Z)
Mobile-Cloud Inference for Collaborative Intelligence [3.04585143845864]
There is an increasing need for faster execution and lower energy consumption for deep learning model inference. Historically, the models run on mobile devices have been smaller and simpler in comparison to large state-of-the-art research models, which can only run on the cloud. Cloud-only inference has drawbacks such as increased network bandwidth consumption and higher latency. There is an alternative approach: shared mobile-cloud inference.
arXiv Detail & Related papers (2023-06-24T14:22:53Z)
Real-Time Image Demoireing on Mobile Devices [59.59997851375429]
We propose a dynamic demoireing acceleration method (DDA) towards a real-time deployment on mobile devices. Our stimulus stems from a simple-yet-universal fact that moire patterns often unbalancedly distribute across an image. Our method can drastically reduce the inference time, leading to a real-time image demoireing on mobile devices.
arXiv Detail & Related papers (2023-02-04T15:42:42Z)
PriMask: Cascadable and Collusion-Resilient Data Masking for Mobile Cloud Inference [8.699639153183723]
A mobile device uses a secret small-scale neural network called MaskNet to mask the data before transmission. PriMask significantly weakens the cloud's capability to recover the data or extract certain private attributes. We apply PriMask to three mobile sensing applications with diverse modalities and complexities.
arXiv Detail & Related papers (2022-11-12T17:54:13Z)
Over-the-Air Federated Learning with Privacy Protection via Correlated Additive Perturbations [57.20885629270732]
We consider privacy aspects of wireless federated learning with Over-the-Air (OtA) transmission of gradient updates from multiple users/agents to an edge server. Traditional perturbation-based methods provide privacy protection while sacrificing the training accuracy. In this work, we aim at minimizing privacy leakage to the adversary and the degradation of model accuracy at the edge server.
arXiv Detail & Related papers (2022-10-05T13:13:35Z)
DUET: A Tuning-Free Device-Cloud Collaborative Parameters Generation Framework for Efficient Device Model Generalization [66.27399823422665]
Device Model Generalization (DMG) is a practical yet under-investigated research topic for on-device machine learning applications. We propose an efficient Device-cloUd collaborative parametErs generaTion framework DUET.
arXiv Detail & Related papers (2022-09-12T13:26:26Z)
On-Device Training Under 256KB Memory [62.95579393237751]
We propose an algorithm-system co-design framework to make on-device training possible with only 256KB of memory. Our framework is the first solution to enable tiny on-device training of convolutional neural networks under 256KB and 1MB Flash.
arXiv Detail & Related papers (2022-06-30T17:59:08Z)
THE-X: Privacy-Preserving Transformer Inference with Homomorphic Encryption [112.02441503951297]
Privacy-preserving inference of transformer models is on the demand of cloud service users. We introduce $textitTHE-X$, an approximation approach for transformers, which enables privacy-preserving inference of pre-trained models.
arXiv Detail & Related papers (2022-06-01T03:49:18Z)
Auto-Split: A General Framework of Collaborative Edge-Cloud AI [49.750972428032355]
This paper describes the techniques and engineering practice behind Auto-Split, an edge-cloud collaborative prototype of Huawei Cloud. To the best of our knowledge, there is no existing industry product that provides the capability of Deep Neural Network (DNN) splitting.
arXiv Detail & Related papers (2021-08-30T08:03:29Z)
Runtime Deep Model Multiplexing for Reduced Latency and Energy Consumption Inference [6.896677899938492]
We propose a learning algorithm to design a light-weight neural multiplexer that calls the model that will consume the minimum compute resources for a successful inference. Mobile devices can use the proposed algorithm to offload the hard inputs to the cloud while inferring the easy ones locally.
arXiv Detail & Related papers (2020-01-14T23:49:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.