Related papers: Efficient Deployment of CNN Models on Multiple In-Memory Computing Units

Efficient Deployment of CNN Models on Multiple In-Memory Computing Units

URL: http://arxiv.org/abs/2511.04682v1
Date: Thu, 09 Oct 2025 14:03:32 GMT
Title: Efficient Deployment of CNN Models on Multiple In-Memory Computing Units
Authors: Eleni Bougioukou, Theodore Antonakopoulos,
Abstract summary: In-Memory Computing (IMC) represents a paradigm shift in deep learning acceleration.<n>We introduce the Load-Balance-Longest-Path (LBLP) algorithm for maximizing the processing rate and minimizing latency due to efficient resources utilization.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In-Memory Computing (IMC) represents a paradigm shift in deep learning acceleration by mitigating data movement bottlenecks and leveraging the inherent parallelism of memory-based computations. The efficient deployment of Convolutional Neural Networks (CNNs) on IMC-based hardware necessitates the use of advanced task allocation strategies for achieving maximum computational efficiency. In this work, we exploit an IMC Emulator (IMCE) with multiple Processing Units (PUs) for investigating how the deployment of a CNN model in a multi-processing system affects its performance, in terms of processing rate and latency. For that purpose, we introduce the Load-Balance-Longest-Path (LBLP) algorithm, that dynamically assigns all CNN nodes to the available IMCE PUs, for maximizing the processing rate and minimizing latency due to efficient resources utilization. We are benchmarking LBLP against other alternative scheduling strategies for a number of CNN models and experimental results demonstrate the effectiveness of the proposed algorithm.

Related papers

CAMP-HiVe: Cyclic Pair Merging based Efficient DNN Pruning with Hessian-Vector Approximation for Resource-Constrained Systems [3.343542849202802]
We introduce CAMP-HiVe, a cyclic pair merging-based pruning with Hessian Vector approximation.<n>Our experimental results demonstrate that our proposed method achieves significant reductions in computational requirements.<n>It outperforms the existing state-of-the-art neural pruning methods.
arXiv Detail & Related papers (2025-11-09T07:58:36Z)
Intra-DP: A High Performance Collaborative Inference System for Mobile Edge Computing [67.98609858326951]
Intra-DP is a high-performance collaborative inference system optimized for deep neural networks (DNNs) on mobile devices.<n>It reduces per-inference latency by up to 50% and energy consumption by up to 75% compared to state-of-the-art baselines.<n>The evaluation demonstrates that Intra-DP reduces per-inference latency by up to 50% and energy consumption by up to 75% compared to state-of-the-art baselines.
arXiv Detail & Related papers (2025-07-08T09:50:57Z)
Memory-Free and Parallel Computation for Quantized Spiking Neural Networks [12.227968342252026]
Quantized Spiking Neural Networks (QSNNs) offer superior energy efficiency and are well-suited for deployment on resource-limited edge devices.<n> limited bit-width weight and membrane potential result in a notable performance decline.<n>We introduce a memory-free quantization method that captures all historical information without directly storing membrane potentials.
arXiv Detail & Related papers (2025-02-25T10:34:25Z)
USEFUSE: Uniform Stride for Enhanced Performance in Fused Layer Architecture of Deep Neural Networks [0.6435156676256051]
This study presents the Sum-of-Products (SOP) units for convolution, which utilize low-latency left-to-right bit-serial arithmetic.<n>An effective mechanism detects and skips inefficient convolutions after ReLU layers, minimizing power consumption.<n>Two designs cater to varied demands: one focuses on minimal response time for mission-critical applications, and another focuses on resource-constrained devices with comparable latency.
arXiv Detail & Related papers (2024-12-18T11:04:58Z)
Energy-Aware FPGA Implementation of Spiking Neural Network with LIF Neurons [0.5243460995467893]
Spiking Neural Networks (SNNs) stand out as a cutting-edge solution for TinyML. This paper presents a novel SNN architecture based on the 1st Order Leaky Integrate-and-Fire (LIF) neuron model. A hardware-friendly LIF design is also proposed, and implemented on a Xilinx Artix-7 FPGA.
arXiv Detail & Related papers (2024-11-03T16:42:10Z)
DNN Partitioning, Task Offloading, and Resource Allocation in Dynamic Vehicular Networks: A Lyapunov-Guided Diffusion-Based Reinforcement Learning Approach [49.56404236394601]
We formulate the problem of joint DNN partitioning, task offloading, and resource allocation in Vehicular Edge Computing. Our objective is to minimize the DNN-based task completion time while guaranteeing the system stability over time. We propose a Multi-Agent Diffusion-based Deep Reinforcement Learning (MAD2RL) algorithm, incorporating the innovative use of diffusion models.
arXiv Detail & Related papers (2024-06-11T06:31:03Z)
Switchable Decision: Dynamic Neural Generation Networks [98.61113699324429]
We propose a switchable decision to accelerate inference by dynamically assigning resources for each data instance. Our method benefits from less cost during inference while keeping the same accuracy.
arXiv Detail & Related papers (2024-05-07T17:44:54Z)
A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs) MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z)
Energy-efficient Task Adaptation for NLP Edge Inference Leveraging Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks. We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z)
Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency. We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z)
A Heterogeneous In-Memory Computing Cluster For Flexible End-to-End Inference of Real-World Deep Neural Networks [12.361842554233558]
Deployment of modern TinyML tasks on small battery-constrained IoT devices requires high computational energy efficiency. Analog In-Memory Computing (IMC) using non-volatile memory (NVM) promises major efficiency improvements in deep neural network (DNN) inference. We present a heterogeneous tightly-coupled architecture integrating 8 RISC-V cores, an in-memory computing accelerator (IMA), and digital accelerators.
arXiv Detail & Related papers (2022-01-04T11:12:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.