Related papers: Early-Exit meets Model-Distributed Inference at Edge Networks

Early-Exit meets Model-Distributed Inference at Edge Networks

URL: http://arxiv.org/abs/2408.05247v1
Date: Thu, 8 Aug 2024 11:53:32 GMT
Title: Early-Exit meets Model-Distributed Inference at Edge Networks
Authors: Marco Colocrese, Erdem Koyuncu, Hulya Seferoglu,
Abstract summary: In data-distributed inference, each worker carries the entire deep neural network (DNN) model but processes only a subset of the data. An emerging paradigm is model-distributed inference (MDI), where each worker carries only a subset of DNN layers. We design a framework MDI-Exit that adaptively determines early-exit and offloading policies as well as data admission at the source.
Score: 17.03578629673371
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Distributed inference techniques can be broadly classified into data-distributed and model-distributed schemes. In data-distributed inference (DDI), each worker carries the entire deep neural network (DNN) model but processes only a subset of the data. However, feeding the data to workers results in high communication costs, especially when the data is large. An emerging paradigm is model-distributed inference (MDI), where each worker carries only a subset of DNN layers. In MDI, a source device that has data processes a few layers of DNN and sends the output to a neighboring device, i.e., offloads the rest of the layers. This process ends when all layers are processed in a distributed manner. In this paper, we investigate the design and development of MDI with early-exit, which advocates that there is no need to process all the layers of a model for some data to reach the desired accuracy, i.e., we can exit the model without processing all the layers if target accuracy is reached. We design a framework MDI-Exit that adaptively determines early-exit and offloading policies as well as data admission at the source. Experimental results on a real-life testbed of NVIDIA Nano edge devices show that MDI-Exit processes more data when accuracy is fixed and results in higher accuracy for the fixed data rate.

Related papers

Impugan: Learning Conditional Generative Models for Robust Data Imputation [0.21847754147782886]
Impugan is a conditional Generative Adversarial Network (cGAN) for imputing missing values and integrating heterogeneous datasets.<n>During inference, the generator reconstructs missing entries from available features, and the discriminator enforces realism by distinguishing true from imputed data.
arXiv Detail & Related papers (2025-12-05T18:46:33Z)
Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents [85.02904078131682]
We introduce the agent data protocol (ADP), a light-weight representation language that serves as an "interlingua" between agent datasets.<n> ADP is expressive enough to capture a large variety of tasks, including API/tool use, browsing, coding, software engineering, and general agentic.<n>All code and data are released publicly, in the hope that ADP could help lower the barrier to standardized, scalable, and reproducible agent training.
arXiv Detail & Related papers (2025-10-28T17:53:13Z)
Priority-Aware Model-Distributed Inference at Edge Networks [6.97067164616875]
Distributed inference techniques can be broadly classified into data-distributed and model-distributed schemes. In data-distributed inference (DDI), each worker carries the entire Machine Learning (ML) model but processes only a subset of the data. An emerging paradigm is model-distributed inference (MDI), where each worker carries only a subset of ML layers.
arXiv Detail & Related papers (2024-12-16T22:01:55Z)
Zero-shot Outlier Detection via Prior-data Fitted Networks: Model Selection Bygone! [28.823740273813296]
Outlier detection (OD) has numerous applications in environmental monitoring, cybersecurity, finance, and medicine. Being an inherently unsupervised task, model selection is a key bottleneck for OD without label supervision. We present FoMo-0D, for zero/0-shot OD exploring a transformative new direction that bypasses the hurdle of model selection altogether.
arXiv Detail & Related papers (2024-09-09T14:41:24Z)
Can LLMs Separate Instructions From Data? And What Do We Even Mean By That? [60.50127555651554]
Large Language Models (LLMs) show impressive results in numerous practical applications, but they lack essential safety features. This makes them vulnerable to manipulations such as indirect prompt injections and generally unsuitable for safety-critical tasks. We introduce a formal measure for instruction-data separation and an empirical variant that is calculable from a model's outputs.
arXiv Detail & Related papers (2024-03-11T15:48:56Z)
Inferring Data Preconditions from Deep Learning Models for Trustworthy Prediction in Deployment [25.527665632625627]
It is important to reason about the trustworthiness of the model's predictions with unseen data during deployment. Existing methods for specifying and verifying traditional software are insufficient for this task. We propose a novel technique that uses rules derived from neural network computations to infer data preconditions.
arXiv Detail & Related papers (2024-01-26T03:47:18Z)
Synthetic data, real errors: how (not) to publish and use synthetic data [86.65594304109567]
We show how the generative process affects the downstream ML task. We introduce Deep Generative Ensemble (DGE) to approximate the posterior distribution over the generative process model parameters.
arXiv Detail & Related papers (2023-05-16T07:30:29Z)
On Calibrating Diffusion Probabilistic Models [78.75538484265292]
diffusion probabilistic models (DPMs) have achieved promising results in diverse generative tasks. We propose a simple way for calibrating an arbitrary pretrained DPM, with which the score matching loss can be reduced and the lower bounds of model likelihood can be increased. Our calibration method is performed only once and the resulting models can be used repeatedly for sampling.
arXiv Detail & Related papers (2023-02-21T14:14:40Z)
Back to the Source: Diffusion-Driven Test-Time Adaptation [77.4229736436935]
Test-time adaptation harnesses test inputs to improve accuracy of a model trained on source data when tested on shifted target data. We instead update the target data, by projecting all test inputs toward the source domain with a generative diffusion model.
arXiv Detail & Related papers (2022-07-07T17:14:10Z)
Joint Distribution across Representation Space for Out-of-Distribution Detection [16.96466730536722]
We present a novel outlook on in-distribution data in a generative manner, which takes their latent features generated from each hidden layer as a joint distribution across representation spaces. We first construct the Gaussian Mixture Model (GMM) based on in-distribution latent features for each hidden layer, and then connect GMMs via the transition probabilities of the inference traces.
arXiv Detail & Related papers (2021-03-23T06:39:29Z)
Data from Model: Extracting Data from Non-robust and Robust Models [83.60161052867534]
This work explores the reverse process of generating data from a model, attempting to reveal the relationship between the data and the model. We repeat the process of Data to Model (DtM) and Data from Model (DfM) in sequence and explore the loss of feature mapping information. Our results show that the accuracy drop is limited even after multiple sequences of DtM and DfM, especially for robust models.
arXiv Detail & Related papers (2020-07-13T05:27:48Z)
Contextual-Bandit Anomaly Detection for IoT Data in Distributed Hierarchical Edge Computing [65.78881372074983]
IoT devices can hardly afford complex deep neural networks (DNN) models, and offloading anomaly detection tasks to the cloud incurs long delay. We propose and build a demo for an adaptive anomaly detection approach for distributed hierarchical edge computing (HEC) systems. We show that our proposed approach significantly reduces detection delay without sacrificing accuracy, as compared to offloading detection tasks to the cloud.
arXiv Detail & Related papers (2020-04-15T06:13:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.