Dynamic Compression Ratio Selection for Edge Inference Systems with Hard
Deadlines
- URL: http://arxiv.org/abs/2005.12235v1
- Date: Mon, 25 May 2020 17:11:53 GMT
- Title: Dynamic Compression Ratio Selection for Edge Inference Systems with Hard
Deadlines
- Authors: Xiufeng Huang, Sheng Zhou
- Abstract summary: We propose a dynamic compression ratio selection scheme for edge inference system with hard deadlines.
Information augmentation that retransmits less compressed data of task with erroneous inference is proposed to enhance the accuracy performance.
Considering the wireless transmission errors, we further design a retransmission scheme to reduce performance degradation due to packet losses.
- Score: 9.585931043664363
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Implementing machine learning algorithms on Internet of things (IoT) devices
has become essential for emerging applications, such as autonomous driving,
environment monitoring. But the limitations of computation capability and
energy consumption make it difficult to run complex machine learning algorithms
on IoT devices, especially when latency deadline exists. One solution is to
offload the computation intensive tasks to the edge server. However, the
wireless uploading of the raw data is time consuming and may lead to deadline
violation. To reduce the communication cost, lossy data compression can be
exploited for inference tasks, but may bring more erroneous inference results.
In this paper, we propose a dynamic compression ratio selection scheme for edge
inference system with hard deadlines. The key idea is to balance the tradeoff
between communication cost and inference accuracy. By dynamically selecting the
optimal compression ratio with the remaining deadline budgets for queued tasks,
more tasks can be timely completed with correct inference under limited
communication resources. Furthermore, information augmentation that retransmits
less compressed data of task with erroneous inference, is proposed to enhance
the accuracy performance. While it is often hard to know the correctness of
inference, we use uncertainty to estimate the confidence of the inference, and
based on that, jointly optimize the information augmentation and compression
ratio selection. Lastly, considering the wireless transmission errors, we
further design a retransmission scheme to reduce performance degradation due to
packet losses. Simulation results show the performance of the proposed schemes
under different deadlines and task arrival rates.
Related papers
- Towards Resource-Efficient Federated Learning in Industrial IoT for Multivariate Time Series Analysis [50.18156030818883]
Anomaly and missing data constitute a thorny problem in industrial applications.
Deep learning enabled anomaly detection has emerged as a critical direction.
The data collected in edge devices contain user privacy.
arXiv Detail & Related papers (2024-11-06T15:38:31Z) - Progressive Neural Compression for Adaptive Image Offloading under
Timing Constraints [9.903309560890317]
It is important to develop an adaptive approach that maximizes the inference performance of machine learning applications under timing constraints.
In this paper, we use image classification as our target application and propose progressive neural compression (PNC) as an efficient solution to this problem.
We demonstrate the benefits of PNC over state-of-the-art neural compression approaches and traditional compression methods on a testbed.
arXiv Detail & Related papers (2023-10-08T22:58:31Z) - A Robust Adaptive Workload Orchestration in Pure Edge Computing [0.0]
Mobility and limited computational capacity of edge devices pose challenges in supporting urgent and computationally intensive tasks.
It is essential to ensure that edge nodes complete as many latency-sensitive tasks as possible.
We propose a Robust Adaptive Workload Orchestration (R-AdWOrch) model to minimize deadline misses and data loss.
arXiv Detail & Related papers (2023-08-15T20:04:18Z) - Analysis and Optimization of Wireless Federated Learning with Data
Heterogeneity [72.85248553787538]
This paper focuses on performance analysis and optimization for wireless FL, considering data heterogeneity, combined with wireless resource allocation.
We formulate the loss function minimization problem, under constraints on long-term energy consumption and latency, and jointly optimize client scheduling, resource allocation, and the number of local training epochs (CRE)
Experiments on real-world datasets demonstrate that the proposed algorithm outperforms other benchmarks in terms of the learning accuracy and energy consumption.
arXiv Detail & Related papers (2023-08-04T04:18:01Z) - Online Learning for Adaptive Probing and Scheduling in Dense WLANs [4.585894579981477]
Existing solutions to network scheduling assume that the instantaneous link rates are completely known before a scheduling decision is made.
We develop an approximation algorithm with guaranteed performance when the probing decision is non-adaptive.
We extend our solutions to the online setting with unknown link rate distributions and develop a contextual-bandit based algorithm.
arXiv Detail & Related papers (2022-12-27T19:12:17Z) - Deep Reinforcement Learning for Trajectory Path Planning and Distributed
Inference in Resource-Constrained UAV Swarms [6.649753747542209]
This work aims to design a model for distributed collaborative inference requests and path planning in a UAV swarm.
The formulated problem is NP-hard so finding the optimal solution is quite complex.
We conduct extensive simulations and compare our results to the-state-of-the-art studies demonstrating that our model outperforms the competing models.
arXiv Detail & Related papers (2022-12-21T17:16:42Z) - An Intelligent Deterministic Scheduling Method for Ultra-Low Latency
Communication in Edge Enabled Industrial Internet of Things [19.277349546331557]
Time Sensitive Network (TSN) is recently researched to realize low latency communication via deterministic scheduling.
Non-collision theory based deterministic scheduling (NDS) method is proposed to achieve ultra-low latency communication for the time-sensitive flows.
Experiment results demonstrate that NDS/DQS can well support deterministic ultra-low latency services and guarantee efficient bandwidth utilization.
arXiv Detail & Related papers (2022-07-17T16:52:51Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - Dynamic Network-Assisted D2D-Aided Coded Distributed Learning [59.29409589861241]
We propose a novel device-to-device (D2D)-aided coded federated learning method (D2D-CFL) for load balancing across devices.
We derive an optimal compression rate for achieving minimum processing time and establish its connection with the convergence time.
Our proposed method is beneficial for real-time collaborative applications, where the users continuously generate training data.
arXiv Detail & Related papers (2021-11-26T18:44:59Z) - Multi-Exit Semantic Segmentation Networks [78.44441236864057]
We propose a framework for converting state-of-the-art segmentation models to MESS networks.
specially trained CNNs that employ parametrised early exits along their depth to save during inference on easier samples.
We co-optimise the number, placement and architecture of the attached segmentation heads, along with the exit policy, to adapt to the device capabilities and application-specific requirements.
arXiv Detail & Related papers (2021-06-07T11:37:03Z) - Energy-Efficient Model Compression and Splitting for Collaborative
Inference Over Time-Varying Channels [52.60092598312894]
We propose a technique to reduce the total energy bill at the edge device by utilizing model compression and time-varying model split between the edge and remote nodes.
Our proposed solution results in minimal energy consumption and $CO$ emission compared to the considered baselines.
arXiv Detail & Related papers (2021-06-02T07:36:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.