On the Sustainability of AI Inferences in the Edge
- URL: http://arxiv.org/abs/2507.23093v1
- Date: Wed, 30 Jul 2025 20:47:22 GMT
- Title: On the Sustainability of AI Inferences in the Edge
- Authors: Ghazal Sobhani, Md. Monzurul Amin Ifath, Tushar Sharma, Israat Haque,
- Abstract summary: Edge devices perform AI inferences to support latency-critical applications.<n>There is no study on their performance and energy usage for informed decision-making.<n>We analyze trade-offs among model F1 score, inference time, inference power, and memory usage.
- Score: 3.71486243189764
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The proliferation of the Internet of Things (IoT) and its cutting-edge AI-enabled applications (e.g., autonomous vehicles and smart industries) combine two paradigms: data-driven systems and their deployment on the edge. Usually, edge devices perform inferences to support latency-critical applications. In addition to the performance of these resource-constrained edge devices, their energy usage is a critical factor in adopting and deploying edge applications. Examples of such devices include Raspberry Pi (RPi), Intel Neural Compute Stick (INCS), NVIDIA Jetson nano (NJn), and Google Coral USB (GCU). Despite their adoption in edge deployment for AI inferences, there is no study on their performance and energy usage for informed decision-making on the device and model selection to meet the demands of applications. This study fills the gap by rigorously characterizing the performance of traditional, neural networks, and large language models on the above-edge devices. Specifically, we analyze trade-offs among model F1 score, inference time, inference power, and memory usage. Hardware and framework optimization, along with external parameter tuning of AI models, can balance between model performance and resource usage to realize practical edge AI deployments.
Related papers
- Edge-Cloud Collaborative Computing on Distributed Intelligence and Model Optimization: A Survey [59.52058740470727]
Edge-cloud collaborative computing (ECCC) has emerged as a pivotal paradigm for addressing the computational demands of modern intelligent applications.<n>Recent advancements in AI, particularly deep learning and large language models (LLMs), have dramatically enhanced the capabilities of these distributed systems.<n>This survey provides a structured tutorial on fundamental architectures, enabling technologies, and emerging applications.
arXiv Detail & Related papers (2025-05-03T13:55:38Z) - Profiling AI Models: Towards Efficient Computation Offloading in Heterogeneous Edge AI Systems [0.2357055571094446]
We propose a research roadmap focused on profiling AI models, capturing data about model types and underlying hardware to predict resource utilisation and task completion time.
Experiments with over 3,000 runs show promise in optimising resource allocation and enhancing Edge AI performance.
arXiv Detail & Related papers (2024-10-30T16:07:14Z) - Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - Green Edge AI: A Contemporary Survey [46.11332733210337]
The transformative power of AI is derived from the utilization of deep neural networks (DNNs)
Deep learning (DL) is increasingly being transitioned to wireless edge networks in proximity to end-user devices (EUDs)
Despite its potential, edge AI faces substantial challenges, mostly due to the dichotomy between the resource limitations of wireless edge networks and the resource-intensive nature of DL.
arXiv Detail & Related papers (2023-12-01T04:04:37Z) - Large Language Models Empowered Autonomous Edge AI for Connected
Intelligence [51.269276328087855]
Edge artificial intelligence (Edge AI) is a promising solution to achieve connected intelligence.
This article presents a vision of autonomous edge AI systems that automatically organize, adapt, and optimize themselves to meet users' diverse requirements.
arXiv Detail & Related papers (2023-07-06T05:16:55Z) - Energy-efficient Task Adaptation for NLP Edge Inference Leveraging
Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks.
We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z) - EPAM: A Predictive Energy Model for Mobile AI [6.451060076703027]
We introduce a comprehensive study of mobile AI applications considering different deep neural network (DNN) models and processing sources.
We measure the latency, energy consumption, and memory usage of all the models using four processing sources.
Our study highlights important insights, such as how mobile AI behaves in different applications (vision and non-vision) using CPU, GPU, and NNAPI.
arXiv Detail & Related papers (2023-03-02T09:11:23Z) - Fluid Batching: Exit-Aware Preemptive Serving of Early-Exit Neural
Networks on Edge NPUs [74.83613252825754]
"smart ecosystems" are being formed where sensing happens concurrently rather than standalone.
This is shifting the on-device inference paradigm towards deploying neural processing units (NPUs) at the edge.
We propose a novel early-exit scheduling that allows preemption at run time to account for the dynamicity introduced by the arrival and exiting processes.
arXiv Detail & Related papers (2022-09-27T15:04:01Z) - How to Reach Real-Time AI on Consumer Devices? Solutions for
Programmable and Custom Architectures [7.085772863979686]
Deep neural networks (DNNs) have led to large strides in various Artificial Intelligence (AI) inference tasks, such as object and speech recognition.
deploying such AI models across commodity devices faces significant challenges.
We present techniques for achieving real-time performance following a cross-stack approach.
arXiv Detail & Related papers (2021-06-21T11:23:12Z) - Towards AIOps in Edge Computing Environments [60.27785717687999]
This paper describes the system design of an AIOps platform which is applicable in heterogeneous, distributed environments.
It is feasible to collect metrics with a high frequency and simultaneously run specific anomaly detection algorithms directly on edge devices.
arXiv Detail & Related papers (2021-02-12T09:33:00Z) - Reliable Fleet Analytics for Edge IoT Solutions [0.0]
We propose a framework for facilitating machine learning at the edge for AIoT applications.
The contribution is an architecture that includes services, tools, and methods for delivering fleet analytics at scale.
We present a preliminary validation of the framework by performing experiments with IoT devices on a university campus's rooms.
arXiv Detail & Related papers (2021-01-12T11:28:43Z) - Cloud2Edge Elastic AI Framework for Prototyping and Deployment of AI
Inference Engines in Autonomous Vehicles [1.688204090869186]
This paper proposes a novel framework for developing AI Inference Engines for autonomous driving applications based on deep learning modules.
We introduce a simple yet elegant solution for the AI components development cycle, where prototyping takes place in the cloud according to the Software-in-the-Loop (SiL) paradigm.
The effectiveness of the proposed framework is demonstrated using two real-world use-cases of AI inference engines for autonomous vehicles.
arXiv Detail & Related papers (2020-09-23T09:23:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.