Related papers: Dynamic Compression Ratio Selection for Edge Inference Systems with Hard Deadlines

Dynamic Compression Ratio Selection for Edge Inference Systems with Hard Deadlines

URL: http://arxiv.org/abs/2005.12235v1
Date: Mon, 25 May 2020 17:11:53 GMT
Title: Dynamic Compression Ratio Selection for Edge Inference Systems with Hard Deadlines
Authors: Xiufeng Huang, Sheng Zhou
Abstract summary: We propose a dynamic compression ratio selection scheme for edge inference system with hard deadlines. Information augmentation that retransmits less compressed data of task with erroneous inference is proposed to enhance the accuracy performance. Considering the wireless transmission errors, we further design a retransmission scheme to reduce performance degradation due to packet losses.
Score: 9.585931043664363
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Implementing machine learning algorithms on Internet of things (IoT) devices has become essential for emerging applications, such as autonomous driving, environment monitoring. But the limitations of computation capability and energy consumption make it difficult to run complex machine learning algorithms on IoT devices, especially when latency deadline exists. One solution is to offload the computation intensive tasks to the edge server. However, the wireless uploading of the raw data is time consuming and may lead to deadline violation. To reduce the communication cost, lossy data compression can be exploited for inference tasks, but may bring more erroneous inference results. In this paper, we propose a dynamic compression ratio selection scheme for edge inference system with hard deadlines. The key idea is to balance the tradeoff between communication cost and inference accuracy. By dynamically selecting the optimal compression ratio with the remaining deadline budgets for queued tasks, more tasks can be timely completed with correct inference under limited communication resources. Furthermore, information augmentation that retransmits less compressed data of task with erroneous inference, is proposed to enhance the accuracy performance. While it is often hard to know the correctness of inference, we use uncertainty to estimate the confidence of the inference, and based on that, jointly optimize the information augmentation and compression ratio selection. Lastly, considering the wireless transmission errors, we further design a retransmission scheme to reduce performance degradation due to packet losses. Simulation results show the performance of the proposed schemes under different deadlines and task arrival rates.

Related papers

Task-Oriented Feature Compression for Multimodal Understanding via Device-Edge Co-Inference [49.77734021302196]
We propose a task-oriented feature compression (TOFC) method for multimodal understanding in a device-edge co-inference framework. To enhance compression efficiency, multiple entropy models are adaptively selected based on the characteristics of the visual features. Results show that TOFC achieves up to 60% reduction in data transmission overhead and 50% reduction in system latency.
arXiv Detail & Related papers (2025-03-17T08:37:22Z)
Interference-Aware Edge Runtime Prediction with Conformal Matrix Completion [10.776912158818437]
Accurately estimating workload runtime is a longstanding goal in computer systems. We develop a matrix factorization-inspired method that generates accurate interference-aware predictions with tight provably-guaranteed uncertainty bounds. We validate our method on a novel WebAssembly runtime dataset collected from 24 unique devices, achieving a prediction error of 5.2% -- 2x better than a naive application of existing methods.
arXiv Detail & Related papers (2025-03-09T03:41:32Z)
Towards Resource-Efficient Federated Learning in Industrial IoT for Multivariate Time Series Analysis [50.18156030818883]
Anomaly and missing data constitute a thorny problem in industrial applications. Deep learning enabled anomaly detection has emerged as a critical direction. The data collected in edge devices contain user privacy.
arXiv Detail & Related papers (2024-11-06T15:38:31Z)
Progressive Neural Compression for Adaptive Image Offloading under Timing Constraints [9.903309560890317]
It is important to develop an adaptive approach that maximizes the inference performance of machine learning applications under timing constraints. In this paper, we use image classification as our target application and propose progressive neural compression (PNC) as an efficient solution to this problem. We demonstrate the benefits of PNC over state-of-the-art neural compression approaches and traditional compression methods on a testbed.
arXiv Detail & Related papers (2023-10-08T22:58:31Z)
A Robust Adaptive Workload Orchestration in Pure Edge Computing [0.0]
Mobility and limited computational capacity of edge devices pose challenges in supporting urgent and computationally intensive tasks. It is essential to ensure that edge nodes complete as many latency-sensitive tasks as possible. We propose a Robust Adaptive Workload Orchestration (R-AdWOrch) model to minimize deadline misses and data loss.
arXiv Detail & Related papers (2023-08-15T20:04:18Z)
Analysis and Optimization of Wireless Federated Learning with Data Heterogeneity [72.85248553787538]
This paper focuses on performance analysis and optimization for wireless FL, considering data heterogeneity, combined with wireless resource allocation. We formulate the loss function minimization problem, under constraints on long-term energy consumption and latency, and jointly optimize client scheduling, resource allocation, and the number of local training epochs (CRE) Experiments on real-world datasets demonstrate that the proposed algorithm outperforms other benchmarks in terms of the learning accuracy and energy consumption.
arXiv Detail & Related papers (2023-08-04T04:18:01Z)
Online Learning for Adaptive Probing and Scheduling in Dense WLANs [4.585894579981477]
Existing solutions to network scheduling assume that the instantaneous link rates are completely known before a scheduling decision is made. We develop an approximation algorithm with guaranteed performance when the probing decision is non-adaptive. We extend our solutions to the online setting with unknown link rate distributions and develop a contextual-bandit based algorithm.
arXiv Detail & Related papers (2022-12-27T19:12:17Z)
Deep Reinforcement Learning for Trajectory Path Planning and Distributed Inference in Resource-Constrained UAV Swarms [6.649753747542209]
This work aims to design a model for distributed collaborative inference requests and path planning in a UAV swarm. The formulated problem is NP-hard so finding the optimal solution is quite complex. We conduct extensive simulations and compare our results to the-state-of-the-art studies demonstrating that our model outperforms the competing models.
arXiv Detail & Related papers (2022-12-21T17:16:42Z)
An Intelligent Deterministic Scheduling Method for Ultra-Low Latency Communication in Edge Enabled Industrial Internet of Things [19.277349546331557]
Time Sensitive Network (TSN) is recently researched to realize low latency communication via deterministic scheduling. Non-collision theory based deterministic scheduling (NDS) method is proposed to achieve ultra-low latency communication for the time-sensitive flows. Experiment results demonstrate that NDS/DQS can well support deterministic ultra-low latency services and guarantee efficient bandwidth utilization.
arXiv Detail & Related papers (2022-07-17T16:52:51Z)
An Adaptive Device-Edge Co-Inference Framework Based on Soft Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices. We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations. Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z)
Dynamic Network-Assisted D2D-Aided Coded Distributed Learning [59.29409589861241]
We propose a novel device-to-device (D2D)-aided coded federated learning method (D2D-CFL) for load balancing across devices. We derive an optimal compression rate for achieving minimum processing time and establish its connection with the convergence time. Our proposed method is beneficial for real-time collaborative applications, where the users continuously generate training data.
arXiv Detail & Related papers (2021-11-26T18:44:59Z)
Multi-Exit Semantic Segmentation Networks [78.44441236864057]
We propose a framework for converting state-of-the-art segmentation models to MESS networks. specially trained CNNs that employ parametrised early exits along their depth to save during inference on easier samples. We co-optimise the number, placement and architecture of the attached segmentation heads, along with the exit policy, to adapt to the device capabilities and application-specific requirements.
arXiv Detail & Related papers (2021-06-07T11:37:03Z)
Energy-Efficient Model Compression and Splitting for Collaborative Inference Over Time-Varying Channels [52.60092598312894]
We propose a technique to reduce the total energy bill at the edge device by utilizing model compression and time-varying model split between the edge and remote nodes. Our proposed solution results in minimal energy consumption and $CO$ emission compared to the considered baselines.
arXiv Detail & Related papers (2021-06-02T07:36:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.