Silicon Photonic 2.5D Interposer Networks for Overcoming Communication
Bottlenecks in Scale-out Machine Learning Hardware Accelerators
- URL: http://arxiv.org/abs/2403.04189v1
- Date: Thu, 7 Mar 2024 03:38:35 GMT
- Title: Silicon Photonic 2.5D Interposer Networks for Overcoming Communication
Bottlenecks in Scale-out Machine Learning Hardware Accelerators
- Authors: Febin Sunny, Ebadollah Taheri, Mahdi Nikdast, Sudeep Pasricha
- Abstract summary: Modern machine learning (ML) applications are becoming increasingly complex and monolithic (single chip) accelerator architectures cannot keep up with their energy efficiency and throughput demands.
This paper outlines how optical communication and computation can be leveraged in 2.5D platforms to realize energy-efficient and high throughput 2.5D ML accelerator architectures.
- Score: 5.482420806459269
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Modern machine learning (ML) applications are becoming increasingly complex
and monolithic (single chip) accelerator architectures cannot keep up with
their energy efficiency and throughput demands. Even though modern digital
electronic accelerators are gradually adopting 2.5D architectures with multiple
smaller chiplets to improve scalability, they face fundamental limitations due
to a reliance on slow metallic interconnects. This paper outlines how optical
communication and computation can be leveraged in 2.5D platforms to realize
energy-efficient and high throughput 2.5D ML accelerator architectures.
Related papers
- Dynamic Spectrum Access for Ambient Backscatter Communication-assisted D2D Systems with Quantum Reinforcement Learning [68.63990729719369]
The wireless spectrum is becoming scarce, resulting in low spectral efficiency for D2D communications.
This paper aims to integrate the ambient backscatter communication technology into D2D devices to allow them to backscatter ambient RF signals.
We develop a novel quantum reinforcement learning (RL) algorithm that can achieve a faster convergence rate with fewer training parameters.
arXiv Detail & Related papers (2024-10-23T15:36:43Z) - Accelerating Error Correction Code Transformers [56.75773430667148]
We introduce a novel acceleration method for transformer-based decoders.
We achieve a 90% compression ratio and reduce arithmetic operation energy consumption by at least 224 times on modern hardware.
arXiv Detail & Related papers (2024-10-08T11:07:55Z) - Optical training of large-scale Transformers and deep neural networks with direct feedback alignment [48.90869997343841]
We experimentally implement a versatile and scalable training algorithm, called direct feedback alignment, on a hybrid electronic-photonic platform.
An optical processing unit performs large-scale random matrix multiplications, which is the central operation of this algorithm, at speeds up to 1500 TeraOps.
We study the compute scaling of our hybrid optical approach, and demonstrate a potential advantage for ultra-deep and wide neural networks.
arXiv Detail & Related papers (2024-09-01T12:48:47Z) - ARTEMIS: A Mixed Analog-Stochastic In-DRAM Accelerator for Transformer Neural Networks [2.9699290794642366]
ARTEMIS is a mixed analog-stochastic in-DRAM accelerator for transformer models.
Our analysis indicates that ARTEMIS exhibits at least 3.0x speedup, 1.8x lower energy, and 1.9x better energy efficiency compared to GPU, TPU, CPU, and state-of-the-art PIM transformer hardware accelerators.
arXiv Detail & Related papers (2024-07-17T15:08:14Z) - Accelerating Neural Networks for Large Language Models and Graph
Processing with Silicon Photonics [4.471962177124311]
Large language models (LLMs) and graph processing have emerged as transformative technologies for natural language processing (NLP), computer vision, and graph-structured data applications.
However, the complex structures of these models pose challenges for acceleration on conventional electronic platforms.
We describe novel hardware accelerators based on silicon photonics to accelerate transformer neural networks that are used in LLMs and graph neural networks for graph data processing.
arXiv Detail & Related papers (2024-01-12T20:32:38Z) - Random resistive memory-based deep extreme point learning machine for
unified visual processing [67.51600474104171]
We propose a novel hardware-software co-design, random resistive memory-based deep extreme point learning machine (DEPLM)
Our co-design system achieves huge energy efficiency improvements and training cost reduction when compared to conventional systems.
arXiv Detail & Related papers (2023-12-14T09:46:16Z) - REED: Chiplet-Based Accelerator for Fully Homomorphic Encryption [4.713756093611972]
We present the first-of-its-kind multi-chiplet-based FHE accelerator REED' for overcoming the limitations of prior monolithic designs.
Results demonstrate that REED 2.5D microprocessor consumes 96.7 mm$2$ chip area, 49.4 W average power in 7nm technology.
arXiv Detail & Related papers (2023-08-05T14:04:39Z) - Machine Learning Accelerators in 2.5D Chiplet Platforms with Silicon
Photonics [5.190207094732673]
Domain-specific machine learning (ML) accelerators such as Google's TPU and Apple's Neural Engine now dominate CPUs and GPU for energy-efficient ML processing.
We present a vision of how optical computation and communication can be integrated into 2.5D chiplet platforms to drive an entirely new class of sustainable and scalable ML hardware accelerators.
arXiv Detail & Related papers (2023-01-28T17:06:53Z) - FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task.
The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources.
It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z) - Resistive Neural Hardware Accelerators [0.46198289193451136]
ReRAM-based in-memory computing has great potential in the implementation of area and power efficient inference.
The shift towards ReRAM-based in-memory computing has great potential in the implementation of area and power efficient inference.
In this survey, we review the state-of-the-art ReRAM-based Deep Neural Networks (DNNs) many-core accelerators.
arXiv Detail & Related papers (2021-09-08T21:11:48Z) - Interleaving: Modular architectures for fault-tolerant photonic quantum
computing [50.591267188664666]
Photonic fusion-based quantum computing (FBQC) uses low-loss photonic delays.
We present a modular architecture for FBQC in which these components are combined to form "interleaving modules"
Exploiting the multiplicative power of delays, each module can add thousands of physical qubits to the computational Hilbert space.
arXiv Detail & Related papers (2021-03-15T18:00:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.