Priority-Aware Model-Distributed Inference at Edge Networks
- URL: http://arxiv.org/abs/2412.12371v1
- Date: Mon, 16 Dec 2024 22:01:55 GMT
- Title: Priority-Aware Model-Distributed Inference at Edge Networks
- Authors: Teng Li, Hulya Seferoglu,
- Abstract summary: Distributed inference techniques can be broadly classified into data-distributed and model-distributed schemes.<n>In data-distributed inference (DDI), each worker carries the entire Machine Learning (ML) model but processes only a subset of the data.<n>An emerging paradigm is model-distributed inference (MDI), where each worker carries only a subset of ML layers.
- Score: 6.97067164616875
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Distributed inference techniques can be broadly classified into data-distributed and model-distributed schemes. In data-distributed inference (DDI), each worker carries the entire Machine Learning (ML) model but processes only a subset of the data. However, feeding the data to workers results in high communication costs, especially when the data is large. An emerging paradigm is model-distributed inference (MDI), where each worker carries only a subset of ML layers. In MDI, a source device that has data processes a few layers of ML model and sends the output to a neighboring device, i.e., offloads the rest of the layers. This process ends when all layers are processed in a distributed manner. In this paper, we investigate the design and development of MDI when multiple data sources co-exist. We consider that each data source has a different importance and, hence, a priority. We formulate and solve a priority-aware model allocation optimization problem. Based on the structure of the optimal solution, we design a practical Priority-Aware Model- Distributed Inference (PA-MDI) algorithm that determines model allocation and distribution over devices by taking into account the priorities of different sources. Experiments were conducted on a real-life testbed of NVIDIA Jetson Xavier and Nano edge devices as well as in the Colosseum testbed with ResNet-50, ResNet- 56, and GPT-2 models. The experimental results show that PA-MDI performs priority-aware model allocation successfully while reducing the inference time as compared to baselines.
Related papers
- Intention-Conditioned Flow Occupancy Models [69.79049994662591]
Large-scale pre-training has fundamentally changed how machine learning research is done today.<n>Applying this same framework to reinforcement learning is appealing because it offers compelling avenues for addressing core challenges in RL.<n>Recent advances in generative AI have provided new tools for modeling highly complex distributions.
arXiv Detail & Related papers (2025-06-10T15:27:46Z) - Federated Learning for Diffusion Models [12.46092849473786]
Diffusion models are powerful generative models that can produce highly realistic samples for various tasks.
We propose FedDDPM-Federated Learning with Denoising Diffusion Probabilistic Models.
We provide a rigorous convergence analysis of FedDDPM and propose an enhanced algorithm, FedDDPM+, to reduce training overheads.
arXiv Detail & Related papers (2025-03-09T03:41:10Z) - Zero-shot Outlier Detection via Prior-data Fitted Networks: Model Selection Bygone! [28.823740273813296]
Outlier detection (OD) has numerous applications in environmental monitoring, cybersecurity, finance, and medicine.
Being an inherently unsupervised task, model selection is a key bottleneck for OD without label supervision.
We present FoMo-0D, for zero/0-shot OD exploring a transformative new direction that bypasses the hurdle of model selection altogether.
arXiv Detail & Related papers (2024-09-09T14:41:24Z) - Early-Exit meets Model-Distributed Inference at Edge Networks [17.03578629673371]
In data-distributed inference, each worker carries the entire deep neural network (DNN) model but processes only a subset of the data.
An emerging paradigm is model-distributed inference (MDI), where each worker carries only a subset of DNN layers.
We design a framework MDI-Exit that adaptively determines early-exit and offloading policies as well as data admission at the source.
arXiv Detail & Related papers (2024-08-08T11:53:32Z) - Everything to the Synthetic: Diffusion-driven Test-time Adaptation via Synthetic-Domain Alignment [81.78901060731269]
Test-time adaptation (TTA) aims to improve the performance of source-domain pre-trained models on previously unseen, shifted target domains.<n>Traditional TTA methods primarily adapt model weights based on target data streams, making model performance sensitive to the amount and order of target data.<n>The recently proposed diffusion-driven TTA methods mitigate this by adapting model inputs instead of weights, where an unconditional diffusion model, trained on the source domain, transforms target-domain data into a synthetic domain that is expected to approximate the source domain.
arXiv Detail & Related papers (2024-06-06T17:39:09Z) - Towards Reliable AI Model Deployments: Multiple Input Mixup for
Out-of-Distribution Detection [4.985768723667418]
We propose a novel and simple method to solve the Out-of-Distribution (OOD) detection problem.
Our method can help improve the OOD detection performance with only single epoch fine-tuning.
Our method does not require training the model from scratch and can be attached to the classifier simply.
arXiv Detail & Related papers (2023-12-24T15:31:51Z) - Diffusion Model is an Effective Planner and Data Synthesizer for
Multi-Task Reinforcement Learning [101.66860222415512]
Multi-Task Diffusion Model (textscMTDiff) is a diffusion-based method that incorporates Transformer backbones and prompt learning for generative planning and data synthesis.
For generative planning, we find textscMTDiff outperforms state-of-the-art algorithms across 50 tasks on Meta-World and 8 maps on Maze2D.
arXiv Detail & Related papers (2023-05-29T05:20:38Z) - Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.
This creates a barrier to fusing knowledge across individual models to yield a better single model.
We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z) - Variant Parallelism: Lightweight Deep Convolutional Models for
Distributed Inference on IoT Devices [0.0]
Two major techniques are commonly used to meet real-time inference limitations when distributing models across resource-constrained IoT devices.
We propose variant parallelism (VP), an ensemble-based deep learning distribution method where different variants of a main model are generated and can be deployed on separate machines.
Our results demonstrate that our models can have 5.8-7.1x fewer parameters, 4.3-31x fewer multiply-accumulations (MACs), and 2.5-13.2x less response time on atomic inputs compared to MobileNetV2.
arXiv Detail & Related papers (2022-10-15T20:52:28Z) - Back to the Source: Diffusion-Driven Test-Time Adaptation [77.4229736436935]
Test-time adaptation harnesses test inputs to improve accuracy of a model trained on source data when tested on shifted target data.
We instead update the target data, by projecting all test inputs toward the source domain with a generative diffusion model.
arXiv Detail & Related papers (2022-07-07T17:14:10Z) - How to Learn when Data Gradually Reacts to Your Model [10.074466859579571]
We propose a new algorithm, Stateful Performative Gradient Descent (Stateful PerfGD), for minimizing the performative loss even in the presence of these effects.
Our experiments confirm that Stateful PerfGD substantially outperforms previous state-of-the-art methods.
arXiv Detail & Related papers (2021-12-13T22:05:26Z) - Contextual-Bandit Anomaly Detection for IoT Data in Distributed
Hierarchical Edge Computing [65.78881372074983]
IoT devices can hardly afford complex deep neural networks (DNN) models, and offloading anomaly detection tasks to the cloud incurs long delay.
We propose and build a demo for an adaptive anomaly detection approach for distributed hierarchical edge computing (HEC) systems.
We show that our proposed approach significantly reduces detection delay without sacrificing accuracy, as compared to offloading detection tasks to the cloud.
arXiv Detail & Related papers (2020-04-15T06:13:33Z) - Do We Really Need to Access the Source Data? Source Hypothesis Transfer
for Unsupervised Domain Adaptation [102.67010690592011]
Unsupervised adaptationUDA (UDA) aims to leverage the knowledge learned from a labeled source dataset to solve similar tasks in a new unlabeled domain.
Prior UDA methods typically require to access the source data when learning to adapt the model.
This work tackles a practical setting where only a trained source model is available and how we can effectively utilize such a model without source data to solve UDA problems.
arXiv Detail & Related papers (2020-02-20T03:13:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.