Multi-DNN Accelerators for Next-Generation AI Systems
- URL: http://arxiv.org/abs/2205.09376v1
- Date: Thu, 19 May 2022 08:15:50 GMT
- Title: Multi-DNN Accelerators for Next-Generation AI Systems
- Authors: Stylianos I. Venieris and Christos-Savvas Bouganis and Nicholas D.
Lane
- Abstract summary: Primary driver of AI technology are the deep neural networks (DNNs)
Next generation of AI systems will have multi-DNN workloads at their core.
- Score: 19.990158911318247
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As the use of AI-powered applications widens across multiple domains, so do
increase the computational demands. Primary driver of AI technology are the
deep neural networks (DNNs). When focusing either on cloud-based systems that
serve multiple AI queries from different users each with their own DNN model,
or on mobile robots and smartphones employing pipelines of various models or
parallel DNNs for the concurrent processing of multi-modal data, the next
generation of AI systems will have multi-DNN workloads at their core.
Large-scale deployment of AI services and integration across mobile and
embedded systems require additional breakthroughs in the computer architecture
front, with processors that can maintain high performance as the number of DNNs
increases while meeting the quality-of-service requirements, giving rise to the
topic of multi-DNN accelerator design.
Related papers
- Twill: Scheduling Compound AI Systems on Heterogeneous Mobile Edge Platforms [1.7835990287552501]
Compound AI (cAI) systems chain multiple AI models to solve complex problems.<n>Existing mobile edge AI inference strategies manage multi-DNN or transformer-only workloads.<n>We present Twill, a run-time framework to handle concurrent inference requests of cAI workloads.
arXiv Detail & Related papers (2025-07-01T07:06:45Z) - Surrogate-Assisted Evolution for Efficient Multi-branch Connection Design in Deep Neural Networks [3.113634696452565]
State-of-the-art Deep Neural Networks (DNNs) often incorporate multi-branch connections.<n>We introduce a novel approach based on Linear Genetic Programming (LGP) to encode multi-branch (MB) connections within DNNs.<n>We scale their use from dozens or hundreds of sample points to thousands, aligning with the demands of complex DNNs.
arXiv Detail & Related papers (2025-06-25T14:18:17Z) - AI Flow: Perspectives, Scenarios, and Approaches [51.38621621775711]
We introduce AI Flow, a framework that integrates cutting-edge IT and CT advancements.<n>First, device-edge-cloud framework serves as the foundation, which integrates end devices, edge servers, and cloud clusters.<n>Second, we introduce the concept of familial models, which refers to a series of different-sized models with aligned hidden features.<n>Third, connectivity- and interaction-based intelligence emergence is a novel paradigm of AI Flow.
arXiv Detail & Related papers (2025-06-14T12:43:07Z) - Optimizing Multi-DNN Inference on Mobile Devices through Heterogeneous Processor Co-Execution [39.033040759452504]
Deep Neural Networks (DNNs) are increasingly deployed across diverse industries, driving demand for mobile device support.
Existing mobile inference frameworks often rely on a single processor per model, limiting hardware utilization and causing suboptimal performance and energy efficiency.
We propose an Advanced Multi-DNN Model Scheduling (ADMS) strategy for optimizing multi-DNN inference on mobile heterogeneous processors.
arXiv Detail & Related papers (2025-03-27T03:03:09Z) - Neuromorphic Wireless Split Computing with Multi-Level Spikes [69.73249913506042]
Neuromorphic computing uses spiking neural networks (SNNs) to perform inference tasks.
embedding a small payload within each spike exchanged between spiking neurons can enhance inference accuracy without increasing energy consumption.
split computing - where an SNN is partitioned across two devices - is a promising solution.
This paper presents the first comprehensive study of a neuromorphic wireless split computing architecture that employs multi-level SNNs.
arXiv Detail & Related papers (2024-11-07T14:08:35Z) - Generative Diffusion-based Contract Design for Efficient AI Twins Migration in Vehicular Embodied AI Networks [55.15079732226397]
Embodied AI is a rapidly advancing field that bridges the gap between cyberspace and physical space.
In VEANET, embodied AI twins act as in-vehicle AI assistants to perform diverse tasks supporting autonomous driving.
arXiv Detail & Related papers (2024-10-02T02:20:42Z) - Spyx: A Library for Just-In-Time Compiled Optimization of Spiking Neural
Networks [0.08965418284317034]
Spiking Neural Networks (SNNs) offer to enhance energy efficiency through a reduced and low-power hardware footprint.
This paper introduces Spyx, a new and lightweight SNN simulation and optimization library designed in JAX.
arXiv Detail & Related papers (2024-02-29T09:46:44Z) - SpikingJelly: An open-source machine learning infrastructure platform
for spike-based intelligence [51.6943465041708]
Spiking neural networks (SNNs) aim to realize brain-inspired intelligence on neuromorphic chips with high energy efficiency.
We contribute a full-stack toolkit for pre-processing neuromorphic datasets, building deep SNNs, optimizing their parameters, and deploying SNNs on neuromorphic chips.
arXiv Detail & Related papers (2023-10-25T13:15:17Z) - Large Language Models Empowered Autonomous Edge AI for Connected
Intelligence [51.269276328087855]
Edge artificial intelligence (Edge AI) is a promising solution to achieve connected intelligence.
This article presents a vision of autonomous edge AI systems that automatically organize, adapt, and optimize themselves to meet users' diverse requirements.
arXiv Detail & Related papers (2023-07-06T05:16:55Z) - Optical multi-task learning using multi-wavelength diffractive deep
neural networks [8.543496127018567]
Photonic neural networks are brain-inspired information processing technology using photons instead of electrons to perform AI tasks.
Existing architectures are designed for a single task but fail to multiplex different tasks in parallel within a single monolithic system.
This paper proposes a novel optical multi-task learning system by designing multi-wavelength diffractive deep neural networks (D2NNs) with the joint optimization method.
arXiv Detail & Related papers (2022-11-30T14:27:14Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Enabling Deep Learning on Edge Devices [2.741266294612776]
Deep neural networks (DNNs) have succeeded in many different perception tasks, e.g., computer vision, natural language processing, reinforcement learning, etc.
The high-performed DNNs heavily rely on intensive resource consumption.
Recently, some new emerging intelligent applications, e.g., AR/VR, mobile assistants, Internet of Things, require us to deploy DNNs on resource-constrained edge devices.
In this dissertation, we studied four edge intelligence scenarios, i.e., Inference on Edge Devices, Adaptation on Edge Devices, Learning on Edge Devices, and Edge-Server Systems
arXiv Detail & Related papers (2022-10-06T20:52:57Z) - Fluid Batching: Exit-Aware Preemptive Serving of Early-Exit Neural
Networks on Edge NPUs [74.83613252825754]
"smart ecosystems" are being formed where sensing happens concurrently rather than standalone.
This is shifting the on-device inference paradigm towards deploying neural processing units (NPUs) at the edge.
We propose a novel early-exit scheduling that allows preemption at run time to account for the dynamicity introduced by the arrival and exiting processes.
arXiv Detail & Related papers (2022-09-27T15:04:01Z) - FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task.
The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources.
It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z) - How to Reach Real-Time AI on Consumer Devices? Solutions for
Programmable and Custom Architectures [7.085772863979686]
Deep neural networks (DNNs) have led to large strides in various Artificial Intelligence (AI) inference tasks, such as object and speech recognition.
deploying such AI models across commodity devices faces significant challenges.
We present techniques for achieving real-time performance following a cross-stack approach.
arXiv Detail & Related papers (2021-06-21T11:23:12Z) - Efficient Low-Latency Dynamic Licensing for Deep Neural Network
Deployment on Edge Devices [0.0]
We propose an architecture to solve deploying and processing deep neural networks on edge-devices.
Adopting this architecture allows low-latency model updates on devices.
arXiv Detail & Related papers (2021-02-24T09:36:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.