Related papers: ProactivePIM: Accelerating Weight-Sharing Embedding Layer with PIM for Scalable Recommendation System

ProactivePIM: Accelerating Weight-Sharing Embedding Layer with PIM for Scalable Recommendation System

URL: http://arxiv.org/abs/2402.04032v5
Date: Thu, 21 Nov 2024 05:55:43 GMT
Title: ProactivePIM: Accelerating Weight-Sharing Embedding Layer with PIM for Scalable Recommendation System
Authors: Youngsuk Kim, Junghwan Lim, Hyuk-Jae Lee, Chae Eun Rhee,
Abstract summary: Weight-sharing algorithms have been proposed for size reduction, but they increase memory access. Recent advancements in processing-in-memory (PIM) enhanced the model throughput by exploiting memory parallelism. We propose ProactivePIM, a PIM system for weight-sharing recommendation system acceleration.
Score: 16.2798383044926
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The model size growth of personalized recommendation systems poses new challenges for inference. Weight-sharing algorithms have been proposed for size reduction, but they increase memory access. Recent advancements in processing-in-memory (PIM) enhanced the model throughput by exploiting memory parallelism, but such algorithms introduce massive CPU-PIM communication into prior PIM systems. We propose ProactivePIM, a PIM system for weight-sharing recommendation system acceleration. ProactivePIM integrates a cache within the PIM with a prefetching scheme to leverage a unique locality of the algorithm and eliminate communication overhead through a subtable mapping strategy. ProactivePIM achieves a 4.8x speedup compared to prior works.

Related papers

HH-PIM: Dynamic Optimization of Power and Performance with Heterogeneous-Hybrid PIM for Edge AI Devices [1.8749305679160366]
This study introduces a Heterogeneous-Hybrid PIM (HH-PIM) architecture, comprising high-performance MRAM-SRAM PIM modules and low-power MRAM-SRAM PIM modules. We show that the proposed HH-PIM achieves up to $60.43$ percent average energy savings over conventional PIMs while meeting application requirements.
arXiv Detail & Related papers (2025-04-02T08:22:32Z)
Enabling Low-Cost Secure Computing on Untrusted In-Memory Architectures [5.565715369147691]
Processing-in-Memory (PIM) promises to substantially improve performance by moving processing closer to the data. Integrating PIM modules within a secure computing system raises an interesting challenge: unencrypted data has to move off-chip to the PIM, exposing the data to attackers and breaking assumptions on Trusted Computing Bases (TCBs) This paperleverages multi-party computation (MPC) techniques, specifically arithmetic secret sharing and Yao's garbled circuits, to outsource bandwidth-intensive computation securely to PIM.
arXiv Detail & Related papers (2025-01-28T20:48:14Z)
Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design [59.00758127310582]
We propose a novel framework Read-ME that transforms pre-trained dense LLMs into smaller MoE models. Our approach employs activation sparsity to extract experts. Read-ME outperforms other popular open-source dense models of similar scales.
arXiv Detail & Related papers (2024-10-24T19:48:51Z)
BoA: Attention-aware Post-training Quantization without Backpropagation [11.096116957844014]
Post-training quantization (PTQ) is a promising solution for deploying large language models (LLMs) on resource-constrained devices. We introduce a novel backpropagation-free PTQ algorithm that optimize integer weights by considering inter-layer dependencies.
arXiv Detail & Related papers (2024-06-19T11:53:21Z)
CLSA-CIM: A Cross-Layer Scheduling Approach for Computing-in-Memory Architectures [0.1747623282473278]
We present CLSA-CIM, a cross-layer scheduling algorithm for tiled CIM architectures. We integrate CLSA-CIM with existing weight-mapping strategies and compare performance against state-of-the-art (SOTA) scheduling algorithms.
arXiv Detail & Related papers (2024-01-15T13:35:21Z)
EPIM: Efficient Processing-In-Memory Accelerators based on Epitome [78.79382890789607]
We introduce the Epitome, a lightweight neural operator offering convolution-like functionality. On the software side, we evaluate epitomes' latency and energy on PIM accelerators. We introduce a PIM-aware layer-wise design method to enhance their hardware efficiency.
arXiv Detail & Related papers (2023-11-12T17:56:39Z)
SimplePIM: A Software Framework for Productive and Efficient Processing-in-Memory [8.844860045305772]
The processing-in-memory (PIM) paradigm aims to alleviate this bottleneck by performing computation inside memory chips. This paper presents a new software framework, SimplePIM, to aid programming real PIM systems. We implement SimplePIM for the UPMEM PIM system and evaluate it on six major applications.
arXiv Detail & Related papers (2023-10-03T08:59:39Z)
A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs) MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z)
Minimizing Age of Information for Mobile Edge Computing Systems: A Nested Index Approach [11.998034941401814]
Mobile edge computation (MEC) provides an efficient approach to achieving real-time applications that are sensitive to information freshness. In this paper, we consider multiple users offloading tasks to heterogeneous edge servers in a MEC system. Our algorithm leads to an optimality gap reduction of up to 40%, compared to benchmarks.
arXiv Detail & Related papers (2023-07-03T21:47:21Z)
Collaborative Intelligent Reflecting Surface Networks with Multi-Agent Reinforcement Learning [63.83425382922157]
Intelligent reflecting surface (IRS) is envisioned to be widely applied in future wireless networks. In this paper, we investigate a multi-user communication system assisted by cooperative IRS devices with the capability of energy harvesting.
arXiv Detail & Related papers (2022-03-26T20:37:14Z)
Distributed Reinforcement Learning for Privacy-Preserving Dynamic Edge Caching [91.50631418179331]
A privacy-preserving distributed deep policy gradient (P2D3PG) is proposed to maximize the cache hit rates of devices in the MEC networks. We convert the distributed optimizations into model-free Markov decision process problems and then introduce a privacy-preserving federated learning method for popularity prediction.
arXiv Detail & Related papers (2021-10-20T02:48:27Z)
Continual Learning Approach for Improving the Data and Computation Mapping in Near-Memory Processing System [3.202860612193139]
We propose an artificially intelligent memory mapping scheme, AIMM, that optimize data placement and resource utilization through page and computation remapping. AIMM uses a neural network to achieve a near-optimal mapping during execution, trained using a reinforcement learning algorithm. Our experimental evaluation shows that AIMM improves the baseline NMP performance in single and multiple program scenario by up to 70% and 50%, respectively.
arXiv Detail & Related papers (2021-04-28T09:50:35Z)
Covert Model Poisoning Against Federated Learning: Algorithm Design and Optimization [76.51980153902774]
Federated learning (FL) is vulnerable to external attacks on FL models during parameters transmissions. In this paper, we propose effective MP algorithms to combat state-of-the-art defensive aggregation mechanisms. Our experimental results demonstrate that the proposed CMP algorithms are effective and substantially outperform existing attack mechanisms.
arXiv Detail & Related papers (2021-01-28T03:28:18Z)
Reinforcement Learning Based Cooperative Coded Caching under Dynamic Popularities in Ultra-Dense Networks [38.44125997148742]
caching strategy at small base stations (SBSs) is critical to meet massive high data rate requests. We exploit reinforcement learning (RL) to design a cooperative caching strategy with maximum-distance separable (MDS) coding.
arXiv Detail & Related papers (2020-03-08T10:45:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.