A Hardware-Efficient Photonic Tensor Core: Accelerating Deep Neural Networks with Structured Compression
- URL: http://arxiv.org/abs/2502.01670v1
- Date: Sat, 01 Feb 2025 17:03:45 GMT
- Title: A Hardware-Efficient Photonic Tensor Core: Accelerating Deep Neural Networks with Structured Compression
- Authors: Shupeng Ning, Hanqing Zhu, Chenghao Feng, Jiaqi Gu, David Z. Pan, Ray T. Chen,
- Abstract summary: AI and deep neural networks (DNNs) have revolutionized numerous fields, enabling complex tasks by extracting intricate features from large datasets.
Optical computing offers a promising alternative due to its inherent advantages of parallelism, high computational speed, and low power consumption.
We introduce a block-circulant photonic tensor core (CirPTC) for a structure-compressed optical neural network (StrC-ONN) architecture.
- Score: 15.665630650382226
- License:
- Abstract: Recent advancements in artificial intelligence (AI) and deep neural networks (DNNs) have revolutionized numerous fields, enabling complex tasks by extracting intricate features from large datasets. However, the exponential growth in computational demands has outstripped the capabilities of traditional electrical hardware accelerators. Optical computing offers a promising alternative due to its inherent advantages of parallelism, high computational speed, and low power consumption. Yet, current photonic integrated circuits (PICs) designed for general matrix multiplication (GEMM) are constrained by large footprints, high costs of electro-optical (E-O) interfaces, and high control complexity, limiting their scalability. To overcome these challenges, we introduce a block-circulant photonic tensor core (CirPTC) for a structure-compressed optical neural network (StrC-ONN) architecture. By applying a structured compression strategy to weight matrices, StrC-ONN significantly reduces model parameters and hardware requirements while preserving the universal representability of networks and maintaining comparable expressivity. Additionally, we propose a hardware-aware training framework to compensate for on-chip nonidealities to improve model robustness and accuracy. We experimentally demonstrate image processing and classification tasks, achieving up to a 74.91% reduction in trainable parameters while maintaining competitive accuracies. Performance analysis expects a computational density of 5.84 tera operations per second (TOPS) per mm^2 and a power efficiency of 47.94 TOPS/W, marking a 6.87-times improvement achieved through the hardware-software co-design approach. By reducing both hardware requirements and control complexity across multiple dimensions, this work explores a new pathway to push the limits of optical computing in the pursuit of high efficiency and scalability.
Related papers
- Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - Optical training of large-scale Transformers and deep neural networks with direct feedback alignment [48.90869997343841]
We experimentally implement a versatile and scalable training algorithm, called direct feedback alignment, on a hybrid electronic-photonic platform.
An optical processing unit performs large-scale random matrix multiplications, which is the central operation of this algorithm, at speeds up to 1500 TeraOps.
We study the compute scaling of our hybrid optical approach, and demonstrate a potential advantage for ultra-deep and wide neural networks.
arXiv Detail & Related papers (2024-09-01T12:48:47Z) - SCATTER: Algorithm-Circuit Co-Sparse Photonic Accelerator with Thermal-Tolerant, Power-Efficient In-situ Light Redistribution [7.378742476019604]
Photonic computing has emerged as a promising solution for accelerating computation-intensive artificial intelligence (AI) workloads.
However, limited reconfigurability, high electrical-optical conversion cost, and thermal sensitivity limit the deployment of current optical analog computing engines to support power-restricted, performance-sensitive AI workloads at scale.
We propose SCATTER, a novel algorithm-circuit co-sparse photonic accelerator featuring dynamically reconfigurable signal path via thermal-tolerant, power-efficient in-situ light redistribution and power gating.
arXiv Detail & Related papers (2024-07-07T22:57:44Z) - Efficient and accurate neural field reconstruction using resistive memory [52.68088466453264]
Traditional signal reconstruction methods on digital computers face both software and hardware challenges.
We propose a systematic approach with software-hardware co-optimizations for signal reconstruction from sparse inputs.
This work advances the AI-driven signal restoration technology and paves the way for future efficient and robust medical AI and 3D vision applications.
arXiv Detail & Related papers (2024-04-15T09:33:09Z) - TeMPO: Efficient Time-Multiplexed Dynamic Photonic Tensor Core for Edge
AI with Compact Slow-Light Electro-Optic Modulator [44.74560543672329]
We present a time-multiplexed dynamic photonic tensor accelerator, dubbed TeMPO, with cross-layer device/circuit/architecture customization.
We achieve a 368.6 TOPS peak performance, 22.3 TOPS/W energy efficiency, and 1.2 TOPS/mm$2$ compute density.
This work signifies the power of cross-layer co-design and domain-specific customization, paving the way for future electronic-photonic accelerators.
arXiv Detail & Related papers (2024-02-12T03:40:32Z) - Sophisticated deep learning with on-chip optical diffractive tensor
processing [5.081061839052458]
Photonic integrated circuits provide an efficient approach to mitigate bandwidth limitations and power-wall brought by electronic counterparts.
We propose an optical computing architecture enabled by on-chip diffraction to implement convolutional acceleration, termed optical convolution unit (OCU)
With OCU as the fundamental unit, we build an optical convolutional neural network (oCNN) to implement two popular deep learning tasks: classification and regression.
arXiv Detail & Related papers (2022-12-20T03:33:26Z) - Energy Efficient Hardware Acceleration of Neural Networks with
Power-of-Two Quantisation [0.0]
We show that a hardware neural network accelerator with PoT weights implemented on the Zynq UltraScale + MPSoC ZCU104 FPGA can be at least $1.4x$ more energy efficient than the uniform quantisation version.
arXiv Detail & Related papers (2022-09-30T06:33:40Z) - FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task.
The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources.
It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - Silicon photonic subspace neural chip for hardware-efficient deep
learning [11.374005508708995]
optical neural network (ONN) is a promising candidate for next-generation neurocomputing.
We devise a hardware-efficient photonic subspace neural network architecture.
We experimentally demonstrate our PSNN on a butterfly-style programmable silicon photonic integrated circuit.
arXiv Detail & Related papers (2021-11-11T06:34:05Z) - Learning Frequency-aware Dynamic Network for Efficient Super-Resolution [56.98668484450857]
This paper explores a novel frequency-aware dynamic network for dividing the input into multiple parts according to its coefficients in the discrete cosine transform (DCT) domain.
In practice, the high-frequency part will be processed using expensive operations and the lower-frequency part is assigned with cheap operations to relieve the computation burden.
Experiments conducted on benchmark SISR models and datasets show that the frequency-aware dynamic network can be employed for various SISR neural architectures.
arXiv Detail & Related papers (2021-03-15T12:54:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.