論文の概要: TIMELY: Pushing Data Movements and Interfaces in PIM Accelerators
Towards Local and in Time Domain
- arxiv url: http://arxiv.org/abs/2005.01206v1
- Date: Sun, 3 May 2020 23:27:51 GMT
- ステータス: 処理完了
- システム内更新日: 2022-12-07 07:04:30.650272
- Title: TIMELY: Pushing Data Movements and Interfaces in PIM Accelerators
Towards Local and in Time Domain
- Title(参考訳): TIMELY:PIM加速器のローカル・インタイム領域へのデータ移動とインタフェースの推進
- Authors: Weitao Li, Pengfei Xu, Yang Zhao, Haitong Li, Yuan Xie, Yingyan Lin
- Abstract要約: 抵抗ランダムアクセスメモリ(ReRAM)ベースのプロセッシングインメモリ(R$2$PIM)アクセラレータは、Thingデバイスの制約されたリソースとConvolutional/Deep Neural Networks(CNNs/DNNs)の禁制的なエネルギーコストとのギャップを埋めることを約束している。
- 参考スコア(独自算出の注目度): 27.66305184703716
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Resistive-random-access-memory (ReRAM) based processing-in-memory (R$^2$PIM)
accelerators show promise in bridging the gap between Internet of Thing
devices' constrained resources and Convolutional/Deep Neural Networks'
(CNNs/DNNs') prohibitive energy cost. Specifically, R$^2$PIM accelerators
enhance energy efficiency by eliminating the cost of weight movements and
improving the computational density through ReRAM's high density. However, the
energy efficiency is still limited by the dominant energy cost of input and
partial sum (Psum) movements and the cost of digital-to-analog (D/A) and
analog-to-digital (A/D) interfaces. In this work, we identify three
energy-saving opportunities in R$^2$PIM accelerators: analog data locality,
time-domain interfacing, and input access reduction, and propose an innovative
R$^2$PIM accelerator called TIMELY, with three key contributions: (1) TIMELY
adopts analog local buffers (ALBs) within ReRAM crossbars to greatly enhance
the data locality, minimizing the energy overheads of both input and Psum
movements; (2) TIMELY largely reduces the energy of each single D/A (and A/D)
conversion and the total number of conversions by using time-domain interfaces
(TDIs) and the employed ALBs, respectively; (3) we develop an only-once input
read (O$^2$IR) mapping method to further decrease the energy of input accesses
and the number of D/A conversions. The evaluation with more than 10 CNN/DNN
models and various chip configurations shows that, TIMELY outperforms the
baseline R$^2$PIM accelerator, PRIME, by one order of magnitude in energy
efficiency while maintaining better computational density (up to 31.2$\times$)
and throughput (up to 736.6$\times$). Furthermore, comprehensive studies are
performed to evaluate the effectiveness of the proposed ALB, TDI, and O$^2$IR
innovations in terms of energy savings and area reduction.
- Abstract(参考訳): 抵抗ランダムアクセスメモリ(ReRAM)ベースの処理インメモリ(R$^2$PIM)アクセラレータは、Thingデバイスの制約されたリソースとConvolutional/Deep Neural Networks(CNNs/DNNs)の禁制エネルギーコストのギャップを埋めることの約束を示す。
In this work, we identify three energy-saving opportunities in R$^2$PIM accelerators: analog data locality, time-domain interfacing, and input access reduction, and propose an innovative R$^2$PIM accelerator called TIMELY, with three key contributions: (1) TIMELY adopts analog local buffers (ALBs) within ReRAM crossbars to greatly enhance the data locality, minimizing the energy overheads of both input and Psum movements; (2) TIMELY largely reduces the energy of each single D/A (and A/D) conversion and the total number of conversions by using time-domain interfaces (TDIs) and the employed ALBs, respectively; (3) we develop an only-once input read (O$^2$IR) mapping method to further decrease the energy of input accesses and the number of D/A conversions.
さらに, 提案するALB, TDI, O$^2$IR技術の有効性を省エネルギーと面積削減の観点から評価するために, 総合的研究を行った。
- SpiDR: A Reconfigurable Digital Compute-in-Memory Spiking Neural Network Accelerator for Event-based Perception [8.968583287058959]
本稿では,CIM (Citical Compute-in-Memory) SNNアクセラレーターを,拡張性および再構成性を備えたチップ名として提案する。
論文 参考訳(メタデータ) (2024-11-05T06:59:02Z) - EPIM: Efficient Processing-In-Memory Accelerators based on Epitome [78.79382890789607]
論文 参考訳(メタデータ) (2023-11-12T17:56:39Z) - Precision-aware Latency and Energy Balancing on Multi-Accelerator
Platforms for DNN Inference [22.9834921448069]
論文 参考訳(メタデータ) (2023-06-08T09:23:46Z) - RAMP: A Flat Nanosecond Optical Network and MPI Operations for
Distributed Deep Learning Systems [68.8204255655161]
論文 参考訳(メタデータ) (2022-11-28T11:24:51Z) - Federated Learning for Energy-limited Wireless Networks: A Partial Model
Aggregation Approach [79.59560136273917]
論文 参考訳(メタデータ) (2022-04-20T19:09:52Z) - Neural-PIM: Efficient Processing-In-Memory with Neural Approximation of
Peripherals [11.31429464715989]
異なるベンチマークによる評価では、Neural-PIMはエネルギー効率を5.36x (1.73x)向上し、スループットを3.43x (1.59x)向上する。
論文 参考訳(メタデータ) (2022-01-30T16:14:49Z) - SmartDeal: Re-Modeling Deep Network Weights for Efficient Inference and
Training [82.35376405568975]
We present SmartDeal (SD), a algorithm framework to trade high-cost memory storage/ access for lower-cost compute。
論文 参考訳(メタデータ) (2021-01-04T18:54:07Z) - EdgeBERT: Sentence-Level Energy Optimizations for Latency-Aware
Multi-Task NLP Inference [82.1584439276834]
We present EdgeBERT, a in-deepth algorithm- hardware co-design for latency-aware energy optimization for multi-task NLP。
論文 参考訳(メタデータ) (2020-11-28T19:21:47Z) - E-BATCH: Energy-Efficient and High-Throughput RNN Batching [0.0]
Recurrent Network(RNN)は、複数の要求にまたがる厳密なデータ利用のために、ハードウェア依存度が低い。
論文 参考訳(メタデータ) (2020-09-22T16:22:23Z) - SmartExchange: Trading Higher-cost Memory Storage/Access for Lower-cost
Computation [97.78417228445883]
We present SmartExchange, a algorithm- hardware co-design framework for energy- efficient inference of Deep Neural Network (DNNs)。
論文 参考訳(メタデータ) (2020-05-07T12:12:49Z) - A New MRAM-based Process In-Memory Accelerator for Efficient Neural
Network Training with Floating Point Precision [28.458719513745812]
実験の結果,提案したSOT-MRAM PIMベースのDNNトレーニングアクセラレータは3.3$times$,1.8$times$,2.5$times$をエネルギー,遅延,面積の面で改善できることがわかった。
論文 参考訳(メタデータ) (2020-03-02T04:58:54Z)