SCATTER: Algorithm-Circuit Co-Sparse Photonic Accelerator with Thermal-Tolerant, Power-Efficient In-situ Light Redistribution
- URL: http://arxiv.org/abs/2407.05510v1
- Date: Sun, 7 Jul 2024 22:57:44 GMT
- Title: SCATTER: Algorithm-Circuit Co-Sparse Photonic Accelerator with Thermal-Tolerant, Power-Efficient In-situ Light Redistribution
- Authors: Ziang Yin, Nicholas Gangi, Meng Zhang, Jeff Zhang, Rena Huang, Jiaqi Gu,
- Abstract summary: Photonic computing has emerged as a promising solution for accelerating computation-intensive artificial intelligence (AI) workloads.
However, limited reconfigurability, high electrical-optical conversion cost, and thermal sensitivity limit the deployment of current optical analog computing engines to support power-restricted, performance-sensitive AI workloads at scale.
We propose SCATTER, a novel algorithm-circuit co-sparse photonic accelerator featuring dynamically reconfigurable signal path via thermal-tolerant, power-efficient in-situ light redistribution and power gating.
- Score: 7.378742476019604
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Photonic computing has emerged as a promising solution for accelerating computation-intensive artificial intelligence (AI) workloads. However, limited reconfigurability, high electrical-optical conversion cost, and thermal sensitivity limit the deployment of current optical analog computing engines to support power-restricted, performance-sensitive AI workloads at scale. Sparsity provides a great opportunity for hardware-efficient AI accelerators. However, current dense photonic accelerators fail to fully exploit the power-saving potential of algorithmic sparsity. It requires sparsity-aware hardware specialization with a fundamental re-design of photonic tensor core topology and cross-layer device-circuit-architecture-algorithm co-optimization aware of hardware non-ideality and power bottleneck. To trim down the redundant power consumption while maximizing robustness to thermal variations, we propose SCATTER, a novel algorithm-circuit co-sparse photonic accelerator featuring dynamically reconfigurable signal path via thermal-tolerant, power-efficient in-situ light redistribution and power gating. A power-optimized, crosstalk-aware dynamic sparse training framework is introduced to explore row-column structured sparsity and ensure marginal accuracy loss and maximum power efficiency. The extensive evaluation shows that our cross-stacked optimized accelerator SCATTER achieves a 511X area reduction and 12.4X power saving with superior crosstalk tolerance that enables unprecedented circuit layout compactness and on-chip power efficiency.
Related papers
- Accelerating Error Correction Code Transformers [56.75773430667148]
We introduce a novel acceleration method for transformer-based decoders.
We achieve a 90% compression ratio and reduce arithmetic operation energy consumption by at least 224 times on modern hardware.
arXiv Detail & Related papers (2024-10-08T11:07:55Z) - Dynamic Range Reduction via Branch-and-Bound [1.533133219129073]
Key strategy to enhance hardware accelerators is the reduction of precision in arithmetic operations.
This paper introduces a fully principled Branch-and-Bound algorithm for reducing precision needs in QUBO problems.
Experiments validate our algorithm's effectiveness on an actual quantum annealer.
arXiv Detail & Related papers (2024-09-17T03:07:56Z) - TeMPO: Efficient Time-Multiplexed Dynamic Photonic Tensor Core for Edge
AI with Compact Slow-Light Electro-Optic Modulator [44.74560543672329]
We present a time-multiplexed dynamic photonic tensor accelerator, dubbed TeMPO, with cross-layer device/circuit/architecture customization.
We achieve a 368.6 TOPS peak performance, 22.3 TOPS/W energy efficiency, and 1.2 TOPS/mm$2$ compute density.
This work signifies the power of cross-layer co-design and domain-specific customization, paving the way for future electronic-photonic accelerators.
arXiv Detail & Related papers (2024-02-12T03:40:32Z) - HEAT: Hardware-Efficient Automatic Tensor Decomposition for Transformer
Compression [69.36555801766762]
We propose a hardware-aware tensor decomposition framework, dubbed HEAT, that enables efficient exploration of the exponential space of possible decompositions.
We experimentally show that our hardware-aware factorized BERT variants reduce the energy-delay product by 5.7x with less than 1.1% accuracy loss.
arXiv Detail & Related papers (2022-11-30T05:31:45Z) - ELight: Enabling Efficient Photonic In-Memory Neurocomputing with Life
Enhancement [10.05922296357776]
In this work, we propose a synergistic optimization framework, ELight, to minimize the overall write efforts for efficient and reliable optical in-memory neurocomputing.
We first propose write-aware training to encourage the similarity among weight blocks, and combine it with a post-training optimization method to reduce programming efforts by eliminating redundant writes.
Experiments show that ELight can achieve over 20X reduction in the total number of writes and dynamic power with comparable accuracy.
arXiv Detail & Related papers (2021-12-15T22:37:24Z) - Deep Reinforcement Learning Based Multidimensional Resource Management
for Energy Harvesting Cognitive NOMA Communications [64.1076645382049]
Combination of energy harvesting (EH), cognitive radio (CR), and non-orthogonal multiple access (NOMA) is a promising solution to improve energy efficiency.
In this paper, we study the spectrum, energy, and time resource management for deterministic-CR-NOMA IoT systems.
arXiv Detail & Related papers (2021-09-17T08:55:48Z) - CrossLight: A Cross-Layer Optimized Silicon Photonic Neural Network
Accelerator [3.49112071745966]
Domain-specific neural network accelerators have seen growing interest in recent years.
We propose a novel cross-layer optimized neural network accelerator called CrossLight.
On average, CrossLight offers 9.5x lower energy-per-bit and 15.9x higher performance-per-watt at 16-bit resolution than state-of-the-art photonic deep learning accelerators.
arXiv Detail & Related papers (2021-02-13T17:08:06Z) - Deep Delay Loop Reservoir Computing for Specific Emitter Identification [0.5906031288935515]
Current AI systems at the tactical edge lack the computational resources to support in-situ training and inference for situational awareness.
We propose a solution through Deep delay Loop Reservoir Computing (DLR), a processing architecture supporting general machine learning algorithms on compact mobile devices.
arXiv Detail & Related papers (2020-10-13T19:32:38Z) - Adaptive pruning-based optimization of parameterized quantum circuits [62.997667081978825]
Variisy hybrid quantum-classical algorithms are powerful tools to maximize the use of Noisy Intermediate Scale Quantum devices.
We propose a strategy for such ansatze used in variational quantum algorithms, which we call "Efficient Circuit Training" (PECT)
Instead of optimizing all of the ansatz parameters at once, PECT launches a sequence of variational algorithms.
arXiv Detail & Related papers (2020-10-01T18:14:11Z) - Rapid characterisation of linear-optical networks via PhaseLift [51.03305009278831]
Integrated photonics offers great phase-stability and can rely on the large scale manufacturability provided by the semiconductor industry.
New devices, based on such optical circuits, hold the promise of faster and energy-efficient computations in machine learning applications.
We present a novel technique to reconstruct the transfer matrix of linear optical networks.
arXiv Detail & Related papers (2020-10-01T16:04:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.