Related papers: TeMPO: Efficient Time-Multiplexed Dynamic Photonic Tensor Core for Edge AI with Compact Slow-Light Electro-Optic Modulator

TeMPO: Efficient Time-Multiplexed Dynamic Photonic Tensor Core for Edge AI with Compact Slow-Light Electro-Optic Modulator

URL: http://arxiv.org/abs/2402.07393v1
Date: Mon, 12 Feb 2024 03:40:32 GMT
Title: TeMPO: Efficient Time-Multiplexed Dynamic Photonic Tensor Core for Edge AI with Compact Slow-Light Electro-Optic Modulator
Authors: Meng Zhang, Dennis Yin, Nicholas Gangi, Amir Begovi\'c, Alexander Chen, Zhaoran Rena Huang, Jiaqi Gu
Abstract summary: We present a time-multiplexed dynamic photonic tensor accelerator, dubbed TeMPO, with cross-layer device/circuit/architecture customization. We achieve a 368.6 TOPS peak performance, 22.3 TOPS/W energy efficiency, and 1.2 TOPS/mm$2$ compute density. This work signifies the power of cross-layer co-design and domain-specific customization, paving the way for future electronic-photonic accelerators.
Score: 44.74560543672329
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Electronic-photonic computing systems offer immense potential in energy-efficient artificial intelligence (AI) acceleration tasks due to the superior computing speed and efficiency of optics, especially for real-time, low-energy deep neural network (DNN) inference tasks on resource-restricted edge platforms. However, current optical neural accelerators based on foundry-available devices and conventional system architecture still encounter a performance gap compared to highly customized electronic counterparts. To bridge the performance gap due to lack of domain specialization, we present a time-multiplexed dynamic photonic tensor accelerator, dubbed TeMPO, with cross-layer device/circuit/architecture customization. At the device level, we present foundry-compatible, customized photonic devices, including a slow-light electro-optic modulator with experimental demonstration, optical splitters, and phase shifters that significantly reduce the footprint and power in input encoding and dot-product calculation. At the circuit level, partial products are hierarchically accumulated via parallel photocurrent aggregation, lightweight capacitive temporal integration, and sequential digital summation, considerably relieving the analog-to-digital conversion bottleneck. We also employ a multi-tile, multi-core architecture to maximize hardware sharing for higher efficiency. Across diverse edge AI workloads, TeMPO delivers digital-comparable task accuracy with superior quantization/noise tolerance. We achieve a 368.6 TOPS peak performance, 22.3 TOPS/W energy efficiency, and 1.2 TOPS/mm$^2$ compute density, pushing the Pareto frontier in edge AI hardware. This work signifies the power of cross-layer co-design and domain-specific customization, paving the way for future electronic-photonic accelerators with even greater performance and efficiency.

Related papers

Beyond Terabit/s Integrated Neuromorphic Photonic Processor for DSP-Free Optical Interconnects [1.9685853627153866]
Multi-scale AI training and inference demand uniform, ultra-low latency, and energy-efficient links. We present an integrated neuromorphic optical signal processor (OSP) that achieves DSP-free, all-optical, real-time processing. This research provides a highly scalable, energy-efficient, and high-speed solution, paving the way for next-generation AI infrastructure.
arXiv Detail & Related papers (2025-04-21T11:56:36Z)
A Hardware-Efficient Photonic Tensor Core: Accelerating Deep Neural Networks with Structured Compression [15.665630650382226]
AI and deep neural networks (DNNs) have revolutionized numerous fields, enabling complex tasks by extracting intricate features from large datasets. Optical computing offers a promising alternative due to its inherent advantages of parallelism, high computational speed, and low power consumption. We introduce a block-circulant photonic tensor core (CirPTC) for a structure-compressed optical neural network (StrC-ONN) architecture.
arXiv Detail & Related papers (2025-02-01T17:03:45Z)
Towards Stabilized and Efficient Diffusion Transformers through Long-Skip-Connections with Spectral Constraints [51.83081671798784]
Diffusion Transformers (DiT) have emerged as a powerful architecture for image and video generation, offering superior quality and scalability.<n>DiT's practical application suffers from inherent dynamic feature instability, leading to error amplification during cached inference.<n>We propose Skip-DiT, an image and video generative DiT variant enhanced with Long-Skip-Connections (LSCs) - the key efficiency component in U-Nets.
arXiv Detail & Related papers (2024-11-26T17:28:10Z)
Multi-Dimensional Reconfigurable, Physically Composable Hybrid Diffractive Optical Neural Network [15.804251049405584]
We introduce a physically composable hybrid diffractive ONN system (MDR-HDONN) By leveraging full-system learnability, MDR-HDONN repurposes fixed fabricated optical hardware, achieving exponentially expanded functionality and superior task adaptability. MDR-HDONN has digital-comparable accuracy on various task adaptations with 74x faster speed and 194x lower energy.
arXiv Detail & Related papers (2024-11-08T18:08:49Z)
Optical training of large-scale Transformers and deep neural networks with direct feedback alignment [48.90869997343841]
We experimentally implement a versatile and scalable training algorithm, called direct feedback alignment, on a hybrid electronic-photonic platform. An optical processing unit performs large-scale random matrix multiplications, which is the central operation of this algorithm, at speeds up to 1500 TeraOps. We study the compute scaling of our hybrid optical approach, and demonstrate a potential advantage for ultra-deep and wide neural networks.
arXiv Detail & Related papers (2024-09-01T12:48:47Z)
SCATTER: Algorithm-Circuit Co-Sparse Photonic Accelerator with Thermal-Tolerant, Power-Efficient In-situ Light Redistribution [7.378742476019604]
Photonic computing has emerged as a promising solution for accelerating computation-intensive artificial intelligence (AI) workloads. However, limited reconfigurability, high electrical-optical conversion cost, and thermal sensitivity limit the deployment of current optical analog computing engines to support power-restricted, performance-sensitive AI workloads at scale. We propose SCATTER, a novel algorithm-circuit co-sparse photonic accelerator featuring dynamically reconfigurable signal path via thermal-tolerant, power-efficient in-situ light redistribution and power gating.
arXiv Detail & Related papers (2024-07-07T22:57:44Z)
Random resistive memory-based deep extreme point learning machine for unified visual processing [67.51600474104171]
We propose a novel hardware-software co-design, random resistive memory-based deep extreme point learning machine (DEPLM) Our co-design system achieves huge energy efficiency improvements and training cost reduction when compared to conventional systems.
arXiv Detail & Related papers (2023-12-14T09:46:16Z)
Integrated multi-operand optical neurons for scalable and hardware-efficient deep learning [10.157562103034]
This work proposes a scalable and efficient optical dot-product engine based on customized multi-operand photonic devices. We experimentally demonstrate the utility of a MOON using a multi-operand-Mach-Zehnder-interferometer (MOMZI) in image recognition tasks.
arXiv Detail & Related papers (2023-05-31T06:25:39Z)
High-Speed and Energy-Efficient Non-Binary Computing with Polymorphic Electro-Optic Circuits and Architectures [0.0]
Polymorphic E-O circuits can be dynamically programmed to implement different logic and arithmetic functions. circuits can support energy-efficient processing of data in non-binary formats. circuits enable E-O computing accelerator architectures for processing binarized and integer quantized convolutional neural networks.
arXiv Detail & Related papers (2023-04-15T18:20:56Z)
Data-Model-Circuit Tri-Design for Ultra-Light Video Intelligence on Edge Devices [90.30316433184414]
We propose a data-model-hardware tri-design framework for high- throughput, low-cost, and high-accuracy MOT on HD video stream. Compared to the state-of-the-art MOT baseline, our tri-design approach can achieve 12.5x latency reduction, 20.9x effective frame rate improvement, 5.83x lower power, and 9.78x better energy efficiency, without much accuracy drop.
arXiv Detail & Related papers (2022-10-16T16:21:40Z)
Single-Shot Optical Neural Network [55.41644538483948]
'Weight-stationary' analog optical and electronic hardware has been proposed to reduce the compute resources required by deep neural networks. We present a scalable, single-shot-per-layer weight-stationary optical processor.
arXiv Detail & Related papers (2022-05-18T17:49:49Z)
EdgeBERT: Sentence-Level Energy Optimizations for Latency-Aware Multi-Task NLP Inference [82.1584439276834]
Transformer-based language models such as BERT provide significant accuracy improvement for a multitude of natural language processing (NLP) tasks. We present EdgeBERT, an in-depth algorithm- hardware co-design for latency-aware energy optimization for multi-task NLP.
arXiv Detail & Related papers (2020-11-28T19:21:47Z)
Photonic tensor cores for machine learning [0.0]
We introduce an integrated photonics-based TPU to perform matrix vector multiplication and summation. We show that the performance of this 8-bit photonic TPU can be 2-3 orders higher compared to an electrical TPU whilst featuring similar chip areas. This work shows that photonic specialized processors have the potential to augment electronic systems and may perform exceptionally well in network-edge devices in the looming 5G networks and beyond.
arXiv Detail & Related papers (2020-02-01T14:11:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.