Combining Cloud and Mobile Computing for Machine Learning
- URL: http://arxiv.org/abs/2402.04880v2
- Date: Fri, 23 Feb 2024 22:17:22 GMT
- Title: Combining Cloud and Mobile Computing for Machine Learning
- Authors: Ruiqi Xu and Tianchi Zhang
- Abstract summary: We consider model segmentation as a solution to improving the user experience.
We show that the division not only reduces the wait time for users but can also be fine-tuned to optimize the workloads of the cloud.
- Score: 2.595189746033637
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although the computing power of mobile devices is increasing, machine
learning models are also growing in size. This trend creates problems for
mobile devices due to limitations like their memory capacity and battery life.
While many services, like ChatGPT and Midjourney, run all the inferences in the
cloud, we believe a flexible and fine-grained task distribution is more
desirable. In this work, we consider model segmentation as a solution to
improving the user experience, dividing the computation between mobile devices
and the cloud in a way that offloads the compute-heavy portion of the model
while minimizing the data transfer required. We show that the division not only
reduces the wait time for users but can also be fine-tuned to optimize the
workloads of the cloud. To achieve that, we design a scheduler that collects
information about network quality, client device capability, and job
requirements, making decisions to achieve consistent performance across a range
of devices while reducing the work the cloud needs to perform.
Related papers
- Managing Bandwidth: The Key to Cloud-Assisted Autonomous Driving [73.55745551827229]
We argue that we can, and must, rely on the cloud for real-time control systems like self-driving cars.
We identify an opportunity to offload parts of time-sensitive and latency-critical compute to the cloud.
arXiv Detail & Related papers (2024-10-21T17:32:36Z) - Efficient Asynchronous Federated Learning with Sparsification and
Quantization [55.6801207905772]
Federated Learning (FL) is attracting more and more attention to collaboratively train a machine learning model without transferring raw data.
FL generally exploits a parameter server and a large number of edge devices during the whole process of the model training.
We propose TEASQ-Fed to exploit edge devices to asynchronously participate in the training process by actively applying for tasks.
arXiv Detail & Related papers (2023-12-23T07:47:07Z) - ECLM: Efficient Edge-Cloud Collaborative Learning with Continuous
Environment Adaptation [47.35179593006409]
We propose ECLM, an edge-cloud collaborative learning framework for rapid model adaptation for dynamic edge environments.
We show that ECLM significantly improves model performance (e.g., 18.89% accuracy increase) and resource efficiency (e.g. 7.12x communication cost reduction) in adapting models to dynamic edge environments.
arXiv Detail & Related papers (2023-11-18T14:10:09Z) - Mobile-Cloud Inference for Collaborative Intelligence [3.04585143845864]
There is an increasing need for faster execution and lower energy consumption for deep learning model inference.
Historically, the models run on mobile devices have been smaller and simpler in comparison to large state-of-the-art research models, which can only run on the cloud.
Cloud-only inference has drawbacks such as increased network bandwidth consumption and higher latency.
There is an alternative approach: shared mobile-cloud inference.
arXiv Detail & Related papers (2023-06-24T14:22:53Z) - MetaNetwork: A Task-agnostic Network Parameters Generation Framework for
Improving Device Model Generalization [65.02542875281233]
We propose a novel task-agnostic framework, named MetaNetwork, for generating adaptive device model parameters from cloud without on-device training.
The MetaGenerator is designed to learn a mapping function from samples to model parameters, and it can generate and deliver the adaptive parameters to the device based on samples uploaded from the device to the cloud.
The MetaStabilizer aims to reduce the oscillation of the MetaGenerator, accelerate the convergence and improve the model performance during both training and inference.
arXiv Detail & Related papers (2022-09-12T13:26:26Z) - On-Device Training Under 256KB Memory [62.95579393237751]
We propose an algorithm-system co-design framework to make on-device training possible with only 256KB of memory.
Our framework is the first solution to enable tiny on-device training of convolutional neural networks under 256KB and 1MB Flash.
arXiv Detail & Related papers (2022-06-30T17:59:08Z) - Optimizing Neural Network for Computer Vision task in Edge Device [0.0]
We deploy a convolution neural network on the edge device itself.
The computational expense for edge devices is reduced by reducing the floating-point precision of the parameters in the model.
This makes an edge device to predict from the neural network all by itself.
arXiv Detail & Related papers (2021-10-02T12:25:18Z) - Device-Cloud Collaborative Learning for Recommendation [50.01289274123047]
We propose a novel MetaPatch learning approach on the device side to efficiently achieve "thousands of people with thousands of models" given a centralized cloud model.
With billions of updated personalized device models, we propose a "model-over-models" distillation algorithm, namely MoMoDistill, to update the centralized cloud model.
arXiv Detail & Related papers (2021-04-14T05:06:59Z) - Shared Mobile-Cloud Inference for Collaborative Intelligence [35.103437828235826]
We present a shared mobile-cloud inference approach for neural model inference.
The strategy can improve inference latency, energy consumption, and network bandwidth usage.
Further performance gain can be achieved by compressing the feature tensor before its transmission.
arXiv Detail & Related papers (2020-02-01T07:12:01Z) - Runtime Deep Model Multiplexing for Reduced Latency and Energy
Consumption Inference [6.896677899938492]
We propose a learning algorithm to design a light-weight neural multiplexer that calls the model that will consume the minimum compute resources for a successful inference.
Mobile devices can use the proposed algorithm to offload the hard inputs to the cloud while inferring the easy ones locally.
arXiv Detail & Related papers (2020-01-14T23:49:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.