An Efficient General-Purpose Optical Accelerator for Neural Networks
- URL: http://arxiv.org/abs/2409.12966v1
- Date: Mon, 2 Sep 2024 13:04:08 GMT
- Title: An Efficient General-Purpose Optical Accelerator for Neural Networks
- Authors: Sijie Fei, Amro Eldebiky, Grace Li Zhang, Bing Li, Ulf Schlichtmann,
- Abstract summary: General-purpose optical accelerators (GOAs) have emerged as a promising platform to accelerate deep neural networks (DNNs)
In this work, a hybrid GOA architecture is proposed to enhance the mapping efficiency of neural networks onto the GOA.
The energy consumption and computation latency can also be reduced by over 67% and 21%, respectively.
- Score: 4.236129222287313
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: General-purpose optical accelerators (GOAs) have emerged as a promising platform to accelerate deep neural networks (DNNs) due to their low latency and energy consumption. Such an accelerator is usually composed of a given number of interleaving Mach-Zehnder- Interferometers (MZIs). This interleaving architecture, however, has a low efficiency when accelerating neural networks of various sizes due to the mismatch between weight matrices and the GOA architecture. In this work, a hybrid GOA architecture is proposed to enhance the mapping efficiency of neural networks onto the GOA. In this architecture, independent MZI modules are connected with microring resonators (MRRs), so that they can be combined to process large neural networks efficiently. Each of these modules implements a unitary matrix with inputs adjusted by tunable coefficients. The parameters of the proposed architecture are searched using genetic algorithm. To enhance the accuracy of neural networks, selected weight matrices are expanded to multiple unitary matrices applying singular value decomposition (SVD). The kernels in neural networks are also adjusted to use up the on-chip computational resources. Experimental results show that with a given number of MZIs, the mapping efficiency of neural networks on the proposed architecture can be enhanced by 21.87%, 21.20%, 24.69%, and 25.52% for VGG16 and Resnet18 on datasets Cifar10 and Cifar100, respectively. The energy consumption and computation latency can also be reduced by over 67% and 21%, respectively.
Related papers
- MCU-MixQ: A HW/SW Co-optimized Mixed-precision Neural Network Design Framework for MCUs [9.719789698194154]
Mixed-precision neural network (MPNN) that utilizes just enough data width for the neural network processing is an effective approach to meet the stringent resources constraints.
However, there is still a lack of sub-byte and mixed-precision SIMD operations in MCU-class ISA.
In this work, we propose to pack multiple low-bitwidth arithmetic operations within a single instruction multiple data (SIMD) instructions in typical MCUs.
arXiv Detail & Related papers (2024-07-17T14:51:15Z) - EPIM: Efficient Processing-In-Memory Accelerators based on Epitome [78.79382890789607]
We introduce the Epitome, a lightweight neural operator offering convolution-like functionality.
On the software side, we evaluate epitomes' latency and energy on PIM accelerators.
We introduce a PIM-aware layer-wise design method to enhance their hardware efficiency.
arXiv Detail & Related papers (2023-11-12T17:56:39Z) - Heterogenous Memory Augmented Neural Networks [84.29338268789684]
We introduce a novel heterogeneous memory augmentation approach for neural networks.
By introducing learnable memory tokens with attention mechanism, we can effectively boost performance without huge computational overhead.
We show our approach on various image and graph-based tasks under both in-distribution (ID) and out-of-distribution (OOD) conditions.
arXiv Detail & Related papers (2023-10-17T01:05:28Z) - Exploring the Approximation Capabilities of Multiplicative Neural
Networks for Smooth Functions [9.936974568429173]
We consider two classes of target functions: generalized bandlimited functions and Sobolev-Type balls.
Our results demonstrate that multiplicative neural networks can approximate these functions with significantly fewer layers and neurons.
These findings suggest that multiplicative gates can outperform standard feed-forward layers and have potential for improving neural network design.
arXiv Detail & Related papers (2023-01-11T17:57:33Z) - A Resource-efficient Spiking Neural Network Accelerator Supporting
Emerging Neural Encoding [6.047137174639418]
Spiking neural networks (SNNs) recently gained momentum due to their low-power multiplication-free computing.
SNNs require very long spike trains (up to 1000) to reach an accuracy similar to their artificial neural network (ANN) counterparts for large models.
We present a novel hardware architecture that can efficiently support SNN with emerging neural encoding.
arXiv Detail & Related papers (2022-06-06T10:56:25Z) - Quantized Neural Networks via {-1, +1} Encoding Decomposition and
Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks.
We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z) - Efficient power allocation using graph neural networks and deep
algorithm unfolding [40.78748956518785]
We study the problem of optimal power allocation in a single-hop ad hoc wireless network.
We propose a hybrid neural architecture inspired by the unfolding of the algorithmic weighted minimum mean squared error (WMMSE)
We show that UWMMSE achieves robustness comparable to that of WMMSE while significantly reducing the computational complexity.
arXiv Detail & Related papers (2020-11-18T05:28:24Z) - Neural Architecture Search For LF-MMI Trained Time Delay Neural Networks [61.76338096980383]
A range of neural architecture search (NAS) techniques are used to automatically learn two types of hyper- parameters of state-of-the-art factored time delay neural networks (TDNNs)
These include the DARTS method integrating architecture selection with lattice-free MMI (LF-MMI) TDNN training.
Experiments conducted on a 300-hour Switchboard corpus suggest the auto-configured systems consistently outperform the baseline LF-MMI TDNN systems.
arXiv Detail & Related papers (2020-07-17T08:32:11Z) - Surrogate-assisted Particle Swarm Optimisation for Evolving
Variable-length Transferable Blocks for Image Classification [8.40112153818812]
A new effective surrogate-assisted particle swarm optimisation algorithm is proposed to automatically evolve convolutional neural networks.
The proposed method shows its effectiveness by achieving competitive error rates of 3.49% on the CIFAR-10 dataset, 18.49% on the CIFAR-100 dataset, and 1.82% on the SVHN dataset.
arXiv Detail & Related papers (2020-07-03T08:48:21Z) - Binarized Graph Neural Network [65.20589262811677]
We develop a binarized graph neural network to learn the binary representations of the nodes with binary network parameters.
Our proposed method can be seamlessly integrated into the existing GNN-based embedding approaches.
Experiments indicate that the proposed binarized graph neural network, namely BGN, is orders of magnitude more efficient in terms of both time and space.
arXiv Detail & Related papers (2020-04-19T09:43:14Z) - When Residual Learning Meets Dense Aggregation: Rethinking the
Aggregation of Deep Neural Networks [57.0502745301132]
We propose Micro-Dense Nets, a novel architecture with global residual learning and local micro-dense aggregations.
Our micro-dense block can be integrated with neural architecture search based models to boost their performance.
arXiv Detail & Related papers (2020-04-19T08:34:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.