ProactivePIM: Accelerating Weight-Sharing Embedding Layer with PIM for Scalable Recommendation System
- URL: http://arxiv.org/abs/2402.04032v5
- Date: Thu, 21 Nov 2024 05:55:43 GMT
- Title: ProactivePIM: Accelerating Weight-Sharing Embedding Layer with PIM for Scalable Recommendation System
- Authors: Youngsuk Kim, Junghwan Lim, Hyuk-Jae Lee, Chae Eun Rhee,
- Abstract summary: Weight-sharing algorithms have been proposed for size reduction, but they increase memory access.
Recent advancements in processing-in-memory (PIM) enhanced the model throughput by exploiting memory parallelism.
We propose ProactivePIM, a PIM system for weight-sharing recommendation system acceleration.
- Score: 16.2798383044926
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The model size growth of personalized recommendation systems poses new challenges for inference. Weight-sharing algorithms have been proposed for size reduction, but they increase memory access. Recent advancements in processing-in-memory (PIM) enhanced the model throughput by exploiting memory parallelism, but such algorithms introduce massive CPU-PIM communication into prior PIM systems. We propose ProactivePIM, a PIM system for weight-sharing recommendation system acceleration. ProactivePIM integrates a cache within the PIM with a prefetching scheme to leverage a unique locality of the algorithm and eliminate communication overhead through a subtable mapping strategy. ProactivePIM achieves a 4.8x speedup compared to prior works.
Related papers
- HH-PIM: Dynamic Optimization of Power and Performance with Heterogeneous-Hybrid PIM for Edge AI Devices [1.8749305679160366]
This study introduces a Heterogeneous-Hybrid PIM (HH-PIM) architecture, comprising high-performance MRAM-SRAM PIM modules and low-power MRAM-SRAM PIM modules.
We show that the proposed HH-PIM achieves up to $60.43$ percent average energy savings over conventional PIMs while meeting application requirements.
arXiv Detail & Related papers (2025-04-02T08:22:32Z) - Enabling Low-Cost Secure Computing on Untrusted In-Memory Architectures [5.565715369147691]
Processing-in-Memory (PIM) promises to substantially improve performance by moving processing closer to the data.
Integrating PIM modules within a secure computing system raises an interesting challenge: unencrypted data has to move off-chip to the PIM, exposing the data to attackers and breaking assumptions on Trusted Computing Bases (TCBs)
This paperleverages multi-party computation (MPC) techniques, specifically arithmetic secret sharing and Yao's garbled circuits, to outsource bandwidth-intensive computation securely to PIM.
arXiv Detail & Related papers (2025-01-28T20:48:14Z) - Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design [59.00758127310582]
We propose a novel framework Read-ME that transforms pre-trained dense LLMs into smaller MoE models.
Our approach employs activation sparsity to extract experts.
Read-ME outperforms other popular open-source dense models of similar scales.
arXiv Detail & Related papers (2024-10-24T19:48:51Z) - BoA: Attention-aware Post-training Quantization without Backpropagation [11.096116957844014]
Post-training quantization (PTQ) is a promising solution for deploying large language models (LLMs) on resource-constrained devices.
We introduce a novel backpropagation-free PTQ algorithm that optimize integer weights by considering inter-layer dependencies.
arXiv Detail & Related papers (2024-06-19T11:53:21Z) - CLSA-CIM: A Cross-Layer Scheduling Approach for Computing-in-Memory
Architectures [0.1747623282473278]
We present CLSA-CIM, a cross-layer scheduling algorithm for tiled CIM architectures.
We integrate CLSA-CIM with existing weight-mapping strategies and compare performance against state-of-the-art (SOTA) scheduling algorithms.
arXiv Detail & Related papers (2024-01-15T13:35:21Z) - EPIM: Efficient Processing-In-Memory Accelerators based on Epitome [78.79382890789607]
We introduce the Epitome, a lightweight neural operator offering convolution-like functionality.
On the software side, we evaluate epitomes' latency and energy on PIM accelerators.
We introduce a PIM-aware layer-wise design method to enhance their hardware efficiency.
arXiv Detail & Related papers (2023-11-12T17:56:39Z) - SimplePIM: A Software Framework for Productive and Efficient
Processing-in-Memory [8.844860045305772]
The processing-in-memory (PIM) paradigm aims to alleviate this bottleneck by performing computation inside memory chips.
This paper presents a new software framework, SimplePIM, to aid programming real PIM systems.
We implement SimplePIM for the UPMEM PIM system and evaluate it on six major applications.
arXiv Detail & Related papers (2023-10-03T08:59:39Z) - A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical
Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs)
MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z) - Minimizing Age of Information for Mobile Edge Computing Systems: A
Nested Index Approach [11.998034941401814]
Mobile edge computation (MEC) provides an efficient approach to achieving real-time applications that are sensitive to information freshness.
In this paper, we consider multiple users offloading tasks to heterogeneous edge servers in a MEC system.
Our algorithm leads to an optimality gap reduction of up to 40%, compared to benchmarks.
arXiv Detail & Related papers (2023-07-03T21:47:21Z) - Collaborative Intelligent Reflecting Surface Networks with Multi-Agent
Reinforcement Learning [63.83425382922157]
Intelligent reflecting surface (IRS) is envisioned to be widely applied in future wireless networks.
In this paper, we investigate a multi-user communication system assisted by cooperative IRS devices with the capability of energy harvesting.
arXiv Detail & Related papers (2022-03-26T20:37:14Z) - Distributed Reinforcement Learning for Privacy-Preserving Dynamic Edge
Caching [91.50631418179331]
A privacy-preserving distributed deep policy gradient (P2D3PG) is proposed to maximize the cache hit rates of devices in the MEC networks.
We convert the distributed optimizations into model-free Markov decision process problems and then introduce a privacy-preserving federated learning method for popularity prediction.
arXiv Detail & Related papers (2021-10-20T02:48:27Z) - Continual Learning Approach for Improving the Data and Computation
Mapping in Near-Memory Processing System [3.202860612193139]
We propose an artificially intelligent memory mapping scheme, AIMM, that optimize data placement and resource utilization through page and computation remapping.
AIMM uses a neural network to achieve a near-optimal mapping during execution, trained using a reinforcement learning algorithm.
Our experimental evaluation shows that AIMM improves the baseline NMP performance in single and multiple program scenario by up to 70% and 50%, respectively.
arXiv Detail & Related papers (2021-04-28T09:50:35Z) - Covert Model Poisoning Against Federated Learning: Algorithm Design and
Optimization [76.51980153902774]
Federated learning (FL) is vulnerable to external attacks on FL models during parameters transmissions.
In this paper, we propose effective MP algorithms to combat state-of-the-art defensive aggregation mechanisms.
Our experimental results demonstrate that the proposed CMP algorithms are effective and substantially outperform existing attack mechanisms.
arXiv Detail & Related papers (2021-01-28T03:28:18Z) - Reinforcement Learning Based Cooperative Coded Caching under Dynamic
Popularities in Ultra-Dense Networks [38.44125997148742]
caching strategy at small base stations (SBSs) is critical to meet massive high data rate requests.
We exploit reinforcement learning (RL) to design a cooperative caching strategy with maximum-distance separable (MDS) coding.
arXiv Detail & Related papers (2020-03-08T10:45:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.