Related papers: Optimizing Edge AI: A Comprehensive Survey on Data, Model, and System Strategies

Optimizing Edge AI: A Comprehensive Survey on Data, Model, and System Strategies

URL: http://arxiv.org/abs/2501.03265v1
Date: Sat, 04 Jan 2025 06:17:48 GMT
Title: Optimizing Edge AI: A Comprehensive Survey on Data, Model, and System Strategies
Authors: Xubin Wang, Weijia Jia,
Abstract summary: 5G and edge computing hardware has brought about a significant shift in artificial intelligence.<n> deploying state-of-the-art AI models on resource-constrained edge devices faces significant challenges.<n>This paper presents an optimization triad for efficient and reliable edge AI deployment.
Score: 14.115655986504411
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The emergence of 5G and edge computing hardware has brought about a significant shift in artificial intelligence, with edge AI becoming a crucial technology for enabling intelligent applications. With the growing amount of data generated and stored on edge devices, deploying AI models for local processing and inference has become increasingly necessary. However, deploying state-of-the-art AI models on resource-constrained edge devices faces significant challenges that must be addressed. This paper presents an optimization triad for efficient and reliable edge AI deployment, including data, model, and system optimization. First, we discuss optimizing data through data cleaning, compression, and augmentation to make it more suitable for edge deployment. Second, we explore model design and compression methods at the model level, such as pruning, quantization, and knowledge distillation. Finally, we introduce system optimization techniques like framework support and hardware acceleration to accelerate edge AI workflows. Based on an in-depth analysis of various application scenarios and deployment challenges of edge AI, this paper proposes an optimization paradigm based on the data-model-system triad to enable a whole set of solutions to effectively transfer ML models, which are initially trained in the cloud, to various edge devices for supporting multiple scenarios.

Related papers

Edge-Cloud Collaborative Computing on Distributed Intelligence and Model Optimization: A Survey [59.52058740470727]
Edge-cloud collaborative computing (ECCC) has emerged as a pivotal paradigm for addressing the computational demands of modern intelligent applications.<n>Recent advancements in AI, particularly deep learning and large language models (LLMs), have dramatically enhanced the capabilities of these distributed systems.<n>This survey provides a structured tutorial on fundamental architectures, enabling technologies, and emerging applications.
arXiv Detail & Related papers (2025-05-03T13:55:38Z)
Empowering Edge Intelligence: A Comprehensive Survey on On-Device AI Models [16.16798813072285]
The rapid advancement of artificial intelligence (AI) technologies has led to an increasing deployment of AI models on edge and terminal devices. This survey comprehensively explores the current state, technical challenges, and future trends of on-device AI models.
arXiv Detail & Related papers (2025-03-08T02:59:51Z)
On Accelerating Edge AI: Optimizing Resource-Constrained Environments [1.7355861031903428]
Resource-constrained edge deployments demand AI solutions that balance high performance with stringent compute, memory, and energy limitations. We present a comprehensive overview of the primary strategies for accelerating deep learning models under such constraints.
arXiv Detail & Related papers (2025-01-25T01:37:03Z)
A Survey on Inference Optimization Techniques for Mixture of Experts Models [50.40325411764262]
MoE models offer enhanced model capacity and computational efficiency through conditional computation.<n>Deployment and inference of MoE models present substantial challenges in terms of computational resources, latency, and energy efficiency.<n>This survey systematically analyzes the current landscape of inference optimization techniques for MoE models across the entire system stack.
arXiv Detail & Related papers (2024-12-18T14:11:15Z)
Two-Timescale Model Caching and Resource Allocation for Edge-Enabled AI-Generated Content Services [55.0337199834612]
Generative AI (GenAI) has emerged as a transformative technology, enabling customized and personalized AI-generated content (AIGC) services. These services require executing GenAI models with billions of parameters, posing significant obstacles to resource-limited wireless edge. We introduce the formulation of joint model caching and resource allocation for AIGC services to balance a trade-off between AIGC quality and latency metrics.
arXiv Detail & Related papers (2024-11-03T07:01:13Z)
Profiling AI Models: Towards Efficient Computation Offloading in Heterogeneous Edge AI Systems [0.2357055571094446]
We propose a research roadmap focused on profiling AI models, capturing data about model types and underlying hardware to predict resource utilisation and task completion time. Experiments with over 3,000 runs show promise in optimising resource allocation and enhancing Edge AI performance.
arXiv Detail & Related papers (2024-10-30T16:07:14Z)
Resource-Efficient Generative AI Model Deployment in Mobile Edge Networks [15.958822667638405]
The scarcity of available resources on the edge pose significant challenges in deploying generative AI models. We present a collaborative edge-cloud framework aiming to properly manage generative AI model deployment on the edge.
arXiv Detail & Related papers (2024-09-09T03:17:28Z)
XEdgeAI: A Human-centered Industrial Inspection Framework with Data-centric Explainable Edge AI Approach [2.0209172586699173]
This paper introduces a novel XAI-integrated Visual Quality Inspection framework. Our framework incorporates XAI and the Large Vision Language Model to deliver human-centered interpretability. This approach paves the way for the broader adoption of reliable and interpretable AI tools in critical industrial applications.
arXiv Detail & Related papers (2024-07-16T14:30:24Z)
Inference Optimization of Foundation Models on AI Accelerators [68.24450520773688]
Powerful foundation models, including large language models (LLMs), with Transformer architectures have ushered in a new era of Generative AI. As the number of model parameters reaches to hundreds of billions, their deployment incurs prohibitive inference costs and high latency in real-world scenarios. This tutorial offers a comprehensive discussion on complementary inference optimization techniques using AI accelerators.
arXiv Detail & Related papers (2024-07-12T09:24:34Z)
Automatic AI Model Selection for Wireless Systems: Online Learning via Digital Twinning [50.332027356848094]
AI-based applications are deployed at intelligent controllers to carry out functionalities like scheduling or power control. The mapping between context and AI model parameters is ideally done in a zero-shot fashion. This paper introduces a general methodology for the online optimization of AMS mappings.
arXiv Detail & Related papers (2024-06-22T11:17:50Z)
GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embedding Fine-tuning [0.0]
GISTEmbed is a novel strategy that enhances in-batch negative selection during contrastive training through a guide model. Benchmarked against the Massive Text Embedding Benchmark (MTEB), GISTEmbed showcases consistent performance improvements across various model sizes.
arXiv Detail & Related papers (2024-02-26T18:55:15Z)
Machine Learning Insides OptVerse AI Solver: Design Principles and Applications [74.67495900436728]
We present a comprehensive study on the integration of machine learning (ML) techniques into Huawei Cloud's OptVerse AI solver. We showcase our methods for generating complex SAT and MILP instances utilizing generative models that mirror multifaceted structures of real-world problem. We detail the incorporation of state-of-the-art parameter tuning algorithms which markedly elevate solver performance.
arXiv Detail & Related papers (2024-01-11T15:02:15Z)
Reconfigurable Intelligent Surface Assisted Mobile Edge Computing with Heterogeneous Learning Tasks [53.1636151439562]
Mobile edge computing (MEC) provides a natural platform for AI applications. We present an infrastructure to perform machine learning tasks at an MEC with the assistance of a reconfigurable intelligent surface (RIS) Specifically, we minimize the learning error of all participating users by jointly optimizing transmit power of mobile users, beamforming vectors of the base station, and the phase-shift matrix of the RIS.
arXiv Detail & Related papers (2020-12-25T07:08:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.