DeepNVM++: Cross-Layer Modeling and Optimization Framework of
Non-Volatile Memories for Deep Learning
- URL: http://arxiv.org/abs/2012.04559v1
- Date: Tue, 8 Dec 2020 16:53:25 GMT
- Title: DeepNVM++: Cross-Layer Modeling and Optimization Framework of
Non-Volatile Memories for Deep Learning
- Authors: Ahmet Inci, Mehmet Meric Isgenc, Diana Marculescu
- Abstract summary: Non-volatile memory (NVM) technologies such as spin-transfer torque magnetic random access memory (STT-MRAM) and spin-orbit torque magnetic random access memory (SOT-MRAM) have significant advantages compared to conventional technologies.
In this work we present DeepNVM++, a framework to characterize, model, and analyze NVM-based caches in deep learning (DL) applications.
- Score: 11.228806840123084
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Non-volatile memory (NVM) technologies such as spin-transfer torque magnetic
random access memory (STT-MRAM) and spin-orbit torque magnetic random access
memory (SOT-MRAM) have significant advantages compared to conventional SRAM due
to their non-volatility, higher cell density, and scalability features. While
previous work has investigated several architectural implications of NVM for
generic applications, in this work we present DeepNVM++, a framework to
characterize, model, and analyze NVM-based caches in GPU architectures for deep
learning (DL) applications by combining technology-specific circuit-level
models and the actual memory behavior of various DL workloads. We present both
iso-capacity and iso-area performance and energy analysis for systems whose
last-level caches rely on conventional SRAM and emerging STT-MRAM and SOT-MRAM
technologies. In the iso-capacity case, STT-MRAM and SOT-MRAM provide up to
3.8x and 4.7x energy-delay product (EDP) reduction and 2.4x and 2.8x area
reduction compared to conventional SRAM, respectively. Under iso-area
assumptions, STT-MRAM and SOT-MRAM provide up to 2x and 2.3x EDP reduction and
accommodate 2.3x and 3.3x cache capacity when compared to SRAM, respectively.
We also perform a scalability analysis and show that STT-MRAM and SOT-MRAM
achieve orders of magnitude EDP reduction when compared to SRAM for large cache
capacities. Our comprehensive cross-layer framework is demonstrated on
STT-/SOT-MRAM technologies and can be used for the characterization, modeling,
and analysis of any NVM technology for last-level caches in GPUs for DL
applications.
Related papers
- LiVOS: Light Video Object Segmentation with Gated Linear Matching [116.58237547253935]
LiVOS is a lightweight memory network that employs linear matching via linear attention.
For longer and higher-resolution videos, it matched STM-based methods with 53% less GPU memory and supports 4096p inference on a 32G consumer-grade GPU.
arXiv Detail & Related papers (2024-11-05T05:36:17Z) - vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving [53.972175896814505]
Large Language Models (LLMs) are widely used across various domains, processing millions of daily requests.
Large Language Models (LLMs) are widely used across various domains, processing millions of daily requests.
arXiv Detail & Related papers (2024-07-22T14:37:58Z) - B'MOJO: Hybrid State Space Realizations of Foundation Models with Eidetic and Fading Memory [91.81390121042192]
We develop a class of models called B'MOJO to seamlessly combine eidetic and fading memory within an composable module.
B'MOJO's ability to modulate eidetic and fading memory results in better inference on longer sequences tested up to 32K tokens.
arXiv Detail & Related papers (2024-07-08T18:41:01Z) - Frequency-Assisted Mamba for Remote Sensing Image Super-Resolution [49.902047563260496]
We develop the first attempt to integrate the Vision State Space Model (Mamba) for remote sensing image (RSI) super-resolution.
To achieve better SR reconstruction, building upon Mamba, we devise a Frequency-assisted Mamba framework, dubbed FMSR.
Our FMSR features a multi-level fusion architecture equipped with the Frequency Selection Module (FSM), Vision State Space Module (VSSM), and Hybrid Gate Module (HGM)
arXiv Detail & Related papers (2024-05-08T11:09:24Z) - DDC-PIM: Efficient Algorithm/Architecture Co-design for Doubling Data
Capacity of SRAM-based Processing-In-Memory [6.367916611208411]
We propose DDC-PIM, an efficient algorithm/architecture co-design methodology that effectively doubles the equivalent data capacity.
DDC-PIM yields about $2.84times$ speedup on MobileNetV2 and $2.69times$ on EfficientNet-B0 with negligible accuracy loss.
Compared with state-of-the-art macros, DDC-PIM achieves up to $8.41times$ and $2.75times$ improvement in weight density and area efficiency, respectively.
arXiv Detail & Related papers (2023-10-31T12:49:54Z) - Evaluation of STT-MRAM as a Scratchpad for Training in ML Accelerators [9.877596714655096]
Training deep neural networks (DNNs) is an extremely memory-intensive process.
Spin-Transfer-Torque MRAM (STT-MRAM) offers several desirable properties for training accelerators.
We show that MRAM provide up to 15-22x improvement in system level energy.
arXiv Detail & Related papers (2023-08-03T20:36:48Z) - TL-nvSRAM-CIM: Ultra-High-Density Three-Level ReRAM-Assisted
Computing-in-nvSRAM with DC-Power Free Restore and Ternary MAC Operations [8.669532093397065]
This work proposes an ultra-high-density three-level ReRAMs-assisted computing scheme for large NN models.
The proposed TL-nvSRAM-CIM achieves 7.8x higher storage density, compared with the state-art works.
arXiv Detail & Related papers (2023-07-06T01:46:06Z) - NumS: Scalable Array Programming for the Cloud [82.827921577004]
We present NumS, an array programming library which optimize NumPy-like expressions on task-based distributed systems.
This is achieved through a novel scheduler called Load Simulated Hierarchical Scheduling (LSHS)
We show that LSHS enhances performance on Ray by decreasing network load by a factor of 2x, requiring 4x less memory, and reducing execution time by 10x on the logistic regression problem.
arXiv Detail & Related papers (2022-06-28T20:13:40Z) - Efficient Deep Learning Using Non-Volatile Memory Technology [12.866655564742889]
We present DeepNVM++, a comprehensive framework to characterize, model, and analyze NVM-based caches in architectures for deep learning (DL) applications.
In the iso-capacity case, STT-MRAM and SOT-MRAM provide up to 3.8x and 4.7x energy-delay product (EDP) reduction and 2.4x and 2.8x area reduction compared to conventional cache.
DeepNVM++ is demonstrated on STT-/SOT-MRAM technologies and can be used for the characterization, modeling, and analysis of any NVM technology for last-level caches in
arXiv Detail & Related papers (2022-06-27T19:27:57Z) - Memory-Oriented Design-Space Exploration of Edge-AI Hardware for XR
Applications [5.529817156718514]
Low-Power Edge-AI capabilities are essential for on-device extended reality (XR) applications to support the vision of Metaverse.
In this work, we investigate two representative XR workloads: (i) Hand detection and (ii) Eye segmentation, for hardware design space exploration.
For both applications, we train deep neural networks and analyze the impact of quantization and hardware specific bottlenecks.
The impact of integrating state-of-the-art emerging non-volatile memory technology (STT/SOT/VGSOT MRAM) into the XR-AI inference pipeline is evaluated.
arXiv Detail & Related papers (2022-06-08T11:18:02Z) - SmartDeal: Re-Modeling Deep Network Weights for Efficient Inference and
Training [82.35376405568975]
Deep neural networks (DNNs) come with heavy parameterization, leading to external dynamic random-access memory (DRAM) for storage.
We present SmartDeal (SD), an algorithm framework to trade higher-cost memory storage/access for lower-cost computation.
We show that SD leads to 10.56x and 4.48x reduction in the storage and training energy, with negligible accuracy loss compared to state-of-the-art training baselines.
arXiv Detail & Related papers (2021-01-04T18:54:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.