Related papers: A Hardware-Efficient Photonic Tensor Core: Accelerating Deep Neural Networks with Structured Compression

A Hardware-Efficient Photonic Tensor Core: Accelerating Deep Neural Networks with Structured Compression

URL: http://arxiv.org/abs/2502.01670v1
Date: Sat, 01 Feb 2025 17:03:45 GMT
Title: A Hardware-Efficient Photonic Tensor Core: Accelerating Deep Neural Networks with Structured Compression
Authors: Shupeng Ning, Hanqing Zhu, Chenghao Feng, Jiaqi Gu, David Z. Pan, Ray T. Chen,
Abstract summary: AI and deep neural networks (DNNs) have revolutionized numerous fields, enabling complex tasks by extracting intricate features from large datasets.<n>Optical computing offers a promising alternative due to its inherent advantages of parallelism, high computational speed, and low power consumption.<n>We introduce a block-circulant photonic tensor core (CirPTC) for a structure-compressed optical neural network (StrC-ONN) architecture.
Score: 15.665630650382226
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advancements in artificial intelligence (AI) and deep neural networks (DNNs) have revolutionized numerous fields, enabling complex tasks by extracting intricate features from large datasets. However, the exponential growth in computational demands has outstripped the capabilities of traditional electrical hardware accelerators. Optical computing offers a promising alternative due to its inherent advantages of parallelism, high computational speed, and low power consumption. Yet, current photonic integrated circuits (PICs) designed for general matrix multiplication (GEMM) are constrained by large footprints, high costs of electro-optical (E-O) interfaces, and high control complexity, limiting their scalability. To overcome these challenges, we introduce a block-circulant photonic tensor core (CirPTC) for a structure-compressed optical neural network (StrC-ONN) architecture. By applying a structured compression strategy to weight matrices, StrC-ONN significantly reduces model parameters and hardware requirements while preserving the universal representability of networks and maintaining comparable expressivity. Additionally, we propose a hardware-aware training framework to compensate for on-chip nonidealities to improve model robustness and accuracy. We experimentally demonstrate image processing and classification tasks, achieving up to a 74.91% reduction in trainable parameters while maintaining competitive accuracies. Performance analysis expects a computational density of 5.84 tera operations per second (TOPS) per mm^2 and a power efficiency of 47.94 TOPS/W, marking a 6.87-times improvement achieved through the hardware-software co-design approach. By reducing both hardware requirements and control complexity across multiple dimensions, this work explores a new pathway to push the limits of optical computing in the pursuit of high efficiency and scalability.

Related papers

Hardware-aware Neural Architecture Search of Early Exiting Networks on Edge Accelerators [12.394874144369396]
Growing demand for embedded intelligence at the edge imposes stringent computational and energy constraints.<n>Early Exiting Neural Networks (EENN) have emerged as a promising solution.<n>We propose a hardware-aware Neural Architecture Search (NAS) framework to optimize the placement of early exit points within a network backbone.
arXiv Detail & Related papers (2025-12-04T11:54:09Z)
Hardware-Efficient Large-Scale Universal Linear Transformations for Optical Modes in the Synthetic Time Dimension [0.6384650391969042]
We introduce a hardware-efficient time-domain photonic processor that achieves at least an exponential reduction in component count.<n>Our results establish a practical pathway toward near-term, scalable, and reconfigurable photonic processors.
arXiv Detail & Related papers (2025-05-01T21:14:48Z)
Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge. Existing methods struggle to balance high model performance with low resource consumption. We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z)
Streamlined optical training of large-scale modern deep learning architectures with direct feedback alignment [48.90869997343841]
We experimentally implement a versatile and scalable training algorithm, called direct feedback alignment, on a hybrid electronic-photonic platform.<n>An optical processing unit performs large-scale random matrix multiplications, at speeds up to 1500 TeraOPS under 30 Watts of power.<n>We study the scaling of the training time, and demonstrate a potential advantage of our hybrid opto-electronic approach for ultra-deep and wide neural networks.
arXiv Detail & Related papers (2024-09-01T12:48:47Z)
Sparsity-Aware Hardware-Software Co-Design of Spiking Neural Networks: An Overview [1.0499611180329804]
Spiking Neural Networks (SNNs) are inspired by the sparse and event-driven nature of biological neural processing, and offer the potential for ultra-low-power artificial intelligence. We explore the hardware-software co-design of sparse SNNs, examining how sparsity representation, hardware architectures, and training techniques influence hardware efficiency. Our work aims to illuminate the path towards embedded neuromorphic systems that fully exploit the computational advantages of sparse SNNs.
arXiv Detail & Related papers (2024-08-26T17:22:11Z)
SCATTER: Algorithm-Circuit Co-Sparse Photonic Accelerator with Thermal-Tolerant, Power-Efficient In-situ Light Redistribution [7.378742476019604]
Photonic computing has emerged as a promising solution for accelerating computation-intensive artificial intelligence (AI) workloads. However, limited reconfigurability, high electrical-optical conversion cost, and thermal sensitivity limit the deployment of current optical analog computing engines to support power-restricted, performance-sensitive AI workloads at scale. We propose SCATTER, a novel algorithm-circuit co-sparse photonic accelerator featuring dynamically reconfigurable signal path via thermal-tolerant, power-efficient in-situ light redistribution and power gating.
arXiv Detail & Related papers (2024-07-07T22:57:44Z)
Efficient and accurate neural field reconstruction using resistive memory [52.68088466453264]
Traditional signal reconstruction methods on digital computers face both software and hardware challenges. We propose a systematic approach with software-hardware co-optimizations for signal reconstruction from sparse inputs. This work advances the AI-driven signal restoration technology and paves the way for future efficient and robust medical AI and 3D vision applications.
arXiv Detail & Related papers (2024-04-15T09:33:09Z)
TeMPO: Efficient Time-Multiplexed Dynamic Photonic Tensor Core for Edge AI with Compact Slow-Light Electro-Optic Modulator [44.74560543672329]
We present a time-multiplexed dynamic photonic tensor accelerator, dubbed TeMPO, with cross-layer device/circuit/architecture customization. We achieve a 368.6 TOPS peak performance, 22.3 TOPS/W energy efficiency, and 1.2 TOPS/mm$2$ compute density. This work signifies the power of cross-layer co-design and domain-specific customization, paving the way for future electronic-photonic accelerators.
arXiv Detail & Related papers (2024-02-12T03:40:32Z)
LitE-SNN: Designing Lightweight and Efficient Spiking Neural Network through Spatial-Temporal Compressive Network Search and Joint Optimization [48.41286573672824]
Spiking Neural Networks (SNNs) mimic the information-processing mechanisms of the human brain and are highly energy-efficient. We propose a new approach named LitE-SNN that incorporates both spatial and temporal compression into the automated network design process.
arXiv Detail & Related papers (2024-01-26T05:23:11Z)
Random resistive memory-based deep extreme point learning machine for unified visual processing [67.51600474104171]
We propose a novel hardware-software co-design, random resistive memory-based deep extreme point learning machine (DEPLM) Our co-design system achieves huge energy efficiency improvements and training cost reduction when compared to conventional systems.
arXiv Detail & Related papers (2023-12-14T09:46:16Z)
Sophisticated deep learning with on-chip optical diffractive tensor processing [5.081061839052458]
Photonic integrated circuits provide an efficient approach to mitigate bandwidth limitations and power-wall brought by electronic counterparts. We propose an optical computing architecture enabled by on-chip diffraction to implement convolutional acceleration, termed optical convolution unit (OCU) With OCU as the fundamental unit, we build an optical convolutional neural network (oCNN) to implement two popular deep learning tasks: classification and regression.
arXiv Detail & Related papers (2022-12-20T03:33:26Z)
Energy Efficient Hardware Acceleration of Neural Networks with Power-of-Two Quantisation [0.0]
We show that a hardware neural network accelerator with PoT weights implemented on the Zynq UltraScale + MPSoC ZCU104 FPGA can be at least $1.4x$ more energy efficient than the uniform quantisation version.
arXiv Detail & Related papers (2022-09-30T06:33:40Z)
FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task. The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources. It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z)
An Adaptive Device-Edge Co-Inference Framework Based on Soft Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices. We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations. Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z)
Silicon photonic subspace neural chip for hardware-efficient deep learning [11.374005508708995]
optical neural network (ONN) is a promising candidate for next-generation neurocomputing. We devise a hardware-efficient photonic subspace neural network architecture. We experimentally demonstrate our PSNN on a butterfly-style programmable silicon photonic integrated circuit.
arXiv Detail & Related papers (2021-11-11T06:34:05Z)
Learning Frequency-aware Dynamic Network for Efficient Super-Resolution [56.98668484450857]
This paper explores a novel frequency-aware dynamic network for dividing the input into multiple parts according to its coefficients in the discrete cosine transform (DCT) domain. In practice, the high-frequency part will be processed using expensive operations and the lower-frequency part is assigned with cheap operations to relieve the computation burden. Experiments conducted on benchmark SISR models and datasets show that the frequency-aware dynamic network can be employed for various SISR neural architectures.
arXiv Detail & Related papers (2021-03-15T12:54:26Z)
Large-scale neuromorphic optoelectronic computing with a reconfigurable diffractive processing unit [38.898230519968116]
We propose an optoelectronic reconfigurable computing paradigm by constructing a diffractive processing unit. It can efficiently support different neural networks and achieve a high model complexity with millions of neurons. Our prototype system built with off-the-shelf optoelectronic components surpasses the performance of state-of-the-art graphics processing units.
arXiv Detail & Related papers (2020-08-26T16:34:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.