EdgeSight: Enabling Modeless and Cost-Efficient Inference at the Edge
- URL: http://arxiv.org/abs/2405.19213v2
- Date: Wed, 15 Jan 2025 04:17:38 GMT
- Title: EdgeSight: Enabling Modeless and Cost-Efficient Inference at the Edge
- Authors: ChonLam Lao, Jiaqi Gao, Ganesh Ananthanarayanan, Aditya Akella, Minlan Yu,
- Abstract summary: We propose EdgeSight, a system that provides cost-efficient modeless inference at the edge.<n>Our experimental results show that EdgeSight outperforms existing systems by up to 1.6x in P99 latency for modeless services.<n>Our FPGA prototype demonstrates similar performance at certain accuracy levels, with a power consumption reduction of up to 3.34x.
- Score: 10.110832890670997
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Traditional ML inference is evolving toward modeless inference, which abstracts the complexity of model selection from users, allowing the system to automatically choose the most appropriate model for each request based on accuracy and resource requirements. While prior studies have focused on modeless inference within data centers, this paper tackles the pressing need for cost-efficient modeless inference at the edge -- particularly within its unique constraints of limited device memory, volatile network conditions, and restricted power consumption. To overcome these challenges, we propose EdgeSight, a system that provides cost-efficient EdgeSight serving for diverse DNNs at the edge. EdgeSight employs an edge-data center (edge-DC) architecture, utilizing confidence scaling to reduce the number of model options while meeting diverse accuracy requirements. Additionally, it supports lossy inference in volatile network environments. Our experimental results show that EdgeSight outperforms existing systems by up to 1.6x in P99 latency for modeless services. Furthermore, our FPGA prototype demonstrates similar performance at certain accuracy levels, with a power consumption reduction of up to 3.34x.
Related papers
- Noise Resilient Over-The-Air Federated Learning In Heterogeneous Wireless Networks [2.2530496464901106]
In 6G wireless networks, Artificial Intelligence (AI)-driven applications demand the adoption of Federated Learning (FL)
Traditional OTA-FL techniques suffer due to the joint effects of Additive White Gaussian Noise (AWGN) at the server, fading, and both data and system heterogeneity at the participating edge devices.
We propose the novel Noise Resilient Over-the-Air Federated Learning (NoROTA-FL) framework to jointly tackle these challenges.
arXiv Detail & Related papers (2025-03-25T11:04:00Z) - LeanTTA: A Backpropagation-Free and Stateless Approach to Quantized Test-Time Adaptation on Edge Devices [13.355021314836852]
We present LeanTTA, a novel backpropagation-free and stateless framework for quantized test-time adaptation tailored to edge devices.
Our approach minimizes computational costs by dynamically updating normalization statistics without backpropagation.
We validate our framework across sensor modalities, demonstrating significant improvements over state-of-the-art TTA methods.
arXiv Detail & Related papers (2025-03-20T06:27:09Z) - Visual Fourier Prompt Tuning [63.66866445034855]
We propose the Visual Fourier Prompt Tuning (VFPT) method as a general and effective solution for adapting large-scale transformer-based models.
Our approach incorporates the Fast Fourier Transform into prompt embeddings and harmoniously considers both spatial and frequency domain information.
Our results demonstrate that our approach outperforms current state-of-the-art baselines on two benchmarks.
arXiv Detail & Related papers (2024-11-02T18:18:35Z) - Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - HRVMamba: High-Resolution Visual State Space Model for Dense Prediction [60.80423207808076]
State Space Models (SSMs) with efficient hardware-aware designs have demonstrated significant potential in computer vision tasks.
These models have been constrained by three key challenges: insufficient inductive bias, long-range forgetting, and low-resolution output representation.
We introduce the Dynamic Visual State Space (DVSS) block, which employs deformable convolution to mitigate the long-range forgetting problem.
We also introduce High-Resolution Visual State Space Model (HRVMamba) based on the DVSS block, which preserves high-resolution representations throughout the entire process.
arXiv Detail & Related papers (2024-10-04T06:19:29Z) - A Green Multi-Attribute Client Selection for Over-The-Air Federated Learning: A Grey-Wolf-Optimizer Approach [5.277822313069301]
Over-the-air (OTA) FL was introduced to tackle these challenges by disseminating model updates without direct device-to-device connections or centralized servers.
OTA-FL brought forth limitations associated with heightened energy consumption and network latency.
We propose a multi-attribute client selection framework employing the grey wolf (GWO) to strategically control the number of participants in each round.
arXiv Detail & Related papers (2024-09-16T20:03:57Z) - LAECIPS: Large Vision Model Assisted Adaptive Edge-Cloud Collaboration for IoT-based Perception System [24.84622024011103]
Edge-cloud collaboration with large-small model co-inference offers a promising approach to achieving high inference accuracy and low latency.
Existing edge-cloud collaboration methods are tightly coupled with the model architecture and cannot adapt to the dynamic data drifts in heterogeneous IoT environments.
In LAECIPS, both the large vision model on the cloud and the lightweight model on the edge are plug-and-play. We design an edge-cloud collaboration strategy based on hard input mining, optimized for both high accuracy and low latency.
arXiv Detail & Related papers (2024-04-16T12:12:06Z) - HSViT: Horizontally Scalable Vision Transformer [16.46308352393693]
Vision Transformer (ViT) requires pre-training on large-scale datasets to perform well.
This paper introduces a novel horizontally scalable vision transformer (HSViT) scheme.
HSViT achieves up to 10% higher top-1 accuracy than state-of-the-art schemes on small datasets.
arXiv Detail & Related papers (2024-04-08T04:53:29Z) - Towards Robust and Efficient Cloud-Edge Elastic Model Adaptation via Selective Entropy Distillation [56.79064699832383]
We establish a Cloud-Edge Elastic Model Adaptation (CEMA) paradigm in which the edge models only need to perform forward propagation.
In our CEMA, to reduce the communication burden, we devise two criteria to exclude unnecessary samples from uploading to the cloud.
arXiv Detail & Related papers (2024-02-27T08:47:19Z) - Adaptive Resource Allocation for Virtualized Base Stations in O-RAN with
Online Learning [60.17407932691429]
Open Radio Access Network systems, with their base stations (vBSs), offer operators the benefits of increased flexibility, reduced costs, vendor diversity, and interoperability.
We propose an online learning algorithm that balances the effective throughput and vBS energy consumption, even under unforeseeable and "challenging'' environments.
We prove the proposed solutions achieve sub-linear regret, providing zero average optimality gap even in challenging environments.
arXiv Detail & Related papers (2023-09-04T17:30:21Z) - DVFO: Learning-Based DVFS for Energy-Efficient Edge-Cloud Collaborative
Inference [12.095934624748686]
We propose DVFO, a novel DVFS-enabled edge-cloud collaborative inference framework.
It automatically co-optimizes the CPU, GPU and memory frequencies of edge devices, and the feature maps to be offloaded to cloud servers.
It significantly reduces the energy consumption by 33% on average, compared to state-of-the-art schemes.
arXiv Detail & Related papers (2023-06-02T07:00:42Z) - LLHR: Low Latency and High Reliability CNN Distributed Inference for
Resource-Constrained UAV Swarms [2.320417845168326]
Unmanned Aerial Vehicles (UAVs) have shown impressive performance in many critical applications, such as surveillance, search and rescue operations, environmental monitoring, etc.
One promising solution is to divide the inference requests into subtasks that can be distributed among UAVs in a swarm based on the available resources.
Our system model deals with real-time requests, aiming to find the optimal transmission power that guarantees higher reliability and low latency.
arXiv Detail & Related papers (2023-05-25T08:47:16Z) - Energy-efficient Task Adaptation for NLP Edge Inference Leveraging
Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks.
We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z) - GOHSP: A Unified Framework of Graph and Optimization-based Heterogeneous
Structured Pruning for Vision Transformer [76.2625311630021]
Vision transformers (ViTs) have shown very impressive empirical performance in various computer vision tasks.
To mitigate this challenging problem, structured pruning is a promising solution to compress model size and enable practical efficiency.
We propose GOHSP, a unified framework of Graph and Optimization-based Structured Pruning for ViT models.
arXiv Detail & Related papers (2023-01-13T00:40:24Z) - DeepFT: Fault-Tolerant Edge Computing using a Self-Supervised Deep
Surrogate Model [12.335763358698564]
We propose DeepFT to proactively avoid system overloads and their adverse effects.
DeepFT uses a deep surrogate model to accurately predict and diagnose faults in the system.
It offers a highly scalable solution as the model size scales by only 3 and 1 percent per unit increase in the number of active tasks and hosts.
arXiv Detail & Related papers (2022-12-02T16:51:58Z) - Fluid Batching: Exit-Aware Preemptive Serving of Early-Exit Neural
Networks on Edge NPUs [74.83613252825754]
"smart ecosystems" are being formed where sensing happens concurrently rather than standalone.
This is shifting the on-device inference paradigm towards deploying neural processing units (NPUs) at the edge.
We propose a novel early-exit scheduling that allows preemption at run time to account for the dynamicity introduced by the arrival and exiting processes.
arXiv Detail & Related papers (2022-09-27T15:04:01Z) - Computational Intelligence and Deep Learning for Next-Generation
Edge-Enabled Industrial IoT [51.68933585002123]
We investigate how to deploy computational intelligence and deep learning (DL) in edge-enabled industrial IoT networks.
In this paper, we propose a novel multi-exit-based federated edge learning (ME-FEEL) framework.
In particular, the proposed ME-FEEL can achieve an accuracy gain up to 32.7% in the industrial IoT networks with the severely limited resources.
arXiv Detail & Related papers (2021-10-28T08:14:57Z) - Adaptive Dynamic Pruning for Non-IID Federated Learning [3.8666113275834335]
Federated Learning(FL) has emerged as a new paradigm of training machine learning models without sacrificing data security and privacy.
We present an adaptive pruning scheme for edge devices in an FL system, which applies dataset-aware dynamic pruning for inference acceleration on Non-IID datasets.
arXiv Detail & Related papers (2021-06-13T05:27:43Z) - Learning to Solve the AC-OPF using Sensitivity-Informed Deep Neural
Networks [52.32646357164739]
We propose a deep neural network (DNN) to solve the solutions of the optimal power flow (ACOPF)
The proposed SIDNN is compatible with a broad range of OPF schemes.
It can be seamlessly integrated in other learning-to-OPF schemes.
arXiv Detail & Related papers (2021-03-27T00:45:23Z) - A Privacy-Preserving-Oriented DNN Pruning and Mobile Acceleration
Framework [56.57225686288006]
Weight pruning of deep neural networks (DNNs) has been proposed to satisfy the limited storage and computing capability of mobile edge devices.
Previous pruning methods mainly focus on reducing the model size and/or improving performance without considering the privacy of user data.
We propose a privacy-preserving-oriented pruning and mobile acceleration framework that does not require the private training dataset.
arXiv Detail & Related papers (2020-03-13T23:52:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.