TeMPO: Efficient Time-Multiplexed Dynamic Photonic Tensor Core for Edge
AI with Compact Slow-Light Electro-Optic Modulator
- URL: http://arxiv.org/abs/2402.07393v1
- Date: Mon, 12 Feb 2024 03:40:32 GMT
- Title: TeMPO: Efficient Time-Multiplexed Dynamic Photonic Tensor Core for Edge
AI with Compact Slow-Light Electro-Optic Modulator
- Authors: Meng Zhang, Dennis Yin, Nicholas Gangi, Amir Begovi\'c, Alexander
Chen, Zhaoran Rena Huang, Jiaqi Gu
- Abstract summary: We present a time-multiplexed dynamic photonic tensor accelerator, dubbed TeMPO, with cross-layer device/circuit/architecture customization.
We achieve a 368.6 TOPS peak performance, 22.3 TOPS/W energy efficiency, and 1.2 TOPS/mm$2$ compute density.
This work signifies the power of cross-layer co-design and domain-specific customization, paving the way for future electronic-photonic accelerators.
- Score: 44.74560543672329
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Electronic-photonic computing systems offer immense potential in
energy-efficient artificial intelligence (AI) acceleration tasks due to the
superior computing speed and efficiency of optics, especially for real-time,
low-energy deep neural network (DNN) inference tasks on resource-restricted
edge platforms. However, current optical neural accelerators based on
foundry-available devices and conventional system architecture still encounter
a performance gap compared to highly customized electronic counterparts. To
bridge the performance gap due to lack of domain specialization, we present a
time-multiplexed dynamic photonic tensor accelerator, dubbed TeMPO, with
cross-layer device/circuit/architecture customization. At the device level, we
present foundry-compatible, customized photonic devices, including a slow-light
electro-optic modulator with experimental demonstration, optical splitters, and
phase shifters that significantly reduce the footprint and power in input
encoding and dot-product calculation. At the circuit level, partial products
are hierarchically accumulated via parallel photocurrent aggregation,
lightweight capacitive temporal integration, and sequential digital summation,
considerably relieving the analog-to-digital conversion bottleneck. We also
employ a multi-tile, multi-core architecture to maximize hardware sharing for
higher efficiency. Across diverse edge AI workloads, TeMPO delivers
digital-comparable task accuracy with superior quantization/noise tolerance. We
achieve a 368.6 TOPS peak performance, 22.3 TOPS/W energy efficiency, and 1.2
TOPS/mm$^2$ compute density, pushing the Pareto frontier in edge AI hardware.
This work signifies the power of cross-layer co-design and domain-specific
customization, paving the way for future electronic-photonic accelerators with
even greater performance and efficiency.
Related papers
- Optical training of large-scale Transformers and deep neural networks with direct feedback alignment [48.90869997343841]
We experimentally implement a versatile and scalable training algorithm, called direct feedback alignment, on a hybrid electronic-photonic platform.
An optical processing unit performs large-scale random matrix multiplications, which is the central operation of this algorithm, at speeds up to 1500 TeraOps.
We study the compute scaling of our hybrid optical approach, and demonstrate a potential advantage for ultra-deep and wide neural networks.
arXiv Detail & Related papers (2024-09-01T12:48:47Z) - SCATTER: Algorithm-Circuit Co-Sparse Photonic Accelerator with Thermal-Tolerant, Power-Efficient In-situ Light Redistribution [7.378742476019604]
Photonic computing has emerged as a promising solution for accelerating computation-intensive artificial intelligence (AI) workloads.
However, limited reconfigurability, high electrical-optical conversion cost, and thermal sensitivity limit the deployment of current optical analog computing engines to support power-restricted, performance-sensitive AI workloads at scale.
We propose SCATTER, a novel algorithm-circuit co-sparse photonic accelerator featuring dynamically reconfigurable signal path via thermal-tolerant, power-efficient in-situ light redistribution and power gating.
arXiv Detail & Related papers (2024-07-07T22:57:44Z) - Random resistive memory-based deep extreme point learning machine for
unified visual processing [67.51600474104171]
We propose a novel hardware-software co-design, random resistive memory-based deep extreme point learning machine (DEPLM)
Our co-design system achieves huge energy efficiency improvements and training cost reduction when compared to conventional systems.
arXiv Detail & Related papers (2023-12-14T09:46:16Z) - Integrated multi-operand optical neurons for scalable and
hardware-efficient deep learning [10.157562103034]
This work proposes a scalable and efficient optical dot-product engine based on customized multi-operand photonic devices.
We experimentally demonstrate the utility of a MOON using a multi-operand-Mach-Zehnder-interferometer (MOMZI) in image recognition tasks.
arXiv Detail & Related papers (2023-05-31T06:25:39Z) - High-Speed and Energy-Efficient Non-Binary Computing with Polymorphic
Electro-Optic Circuits and Architectures [0.0]
Polymorphic E-O circuits can be dynamically programmed to implement different logic and arithmetic functions.
circuits can support energy-efficient processing of data in non-binary formats.
circuits enable E-O computing accelerator architectures for processing binarized and integer quantized convolutional neural networks.
arXiv Detail & Related papers (2023-04-15T18:20:56Z) - Data-Model-Circuit Tri-Design for Ultra-Light Video Intelligence on Edge
Devices [90.30316433184414]
We propose a data-model-hardware tri-design framework for high- throughput, low-cost, and high-accuracy MOT on HD video stream.
Compared to the state-of-the-art MOT baseline, our tri-design approach can achieve 12.5x latency reduction, 20.9x effective frame rate improvement, 5.83x lower power, and 9.78x better energy efficiency, without much accuracy drop.
arXiv Detail & Related papers (2022-10-16T16:21:40Z) - Single-Shot Optical Neural Network [55.41644538483948]
'Weight-stationary' analog optical and electronic hardware has been proposed to reduce the compute resources required by deep neural networks.
We present a scalable, single-shot-per-layer weight-stationary optical processor.
arXiv Detail & Related papers (2022-05-18T17:49:49Z) - EdgeBERT: Sentence-Level Energy Optimizations for Latency-Aware
Multi-Task NLP Inference [82.1584439276834]
Transformer-based language models such as BERT provide significant accuracy improvement for a multitude of natural language processing (NLP) tasks.
We present EdgeBERT, an in-depth algorithm- hardware co-design for latency-aware energy optimization for multi-task NLP.
arXiv Detail & Related papers (2020-11-28T19:21:47Z) - Photonic tensor cores for machine learning [0.0]
We introduce an integrated photonics-based TPU to perform matrix vector multiplication and summation.
We show that the performance of this 8-bit photonic TPU can be 2-3 orders higher compared to an electrical TPU whilst featuring similar chip areas.
This work shows that photonic specialized processors have the potential to augment electronic systems and may perform exceptionally well in network-edge devices in the looming 5G networks and beyond.
arXiv Detail & Related papers (2020-02-01T14:11:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.