Silicon Photonic 2.5D Interposer Networks for Overcoming Communication
Bottlenecks in Scale-out Machine Learning Hardware Accelerators
- URL: http://arxiv.org/abs/2403.04189v1
- Date: Thu, 7 Mar 2024 03:38:35 GMT
- Title: Silicon Photonic 2.5D Interposer Networks for Overcoming Communication
Bottlenecks in Scale-out Machine Learning Hardware Accelerators
- Authors: Febin Sunny, Ebadollah Taheri, Mahdi Nikdast, Sudeep Pasricha
- Abstract summary: Modern machine learning (ML) applications are becoming increasingly complex and monolithic (single chip) accelerator architectures cannot keep up with their energy efficiency and throughput demands.
This paper outlines how optical communication and computation can be leveraged in 2.5D platforms to realize energy-efficient and high throughput 2.5D ML accelerator architectures.
- Score: 5.482420806459269
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Modern machine learning (ML) applications are becoming increasingly complex
and monolithic (single chip) accelerator architectures cannot keep up with
their energy efficiency and throughput demands. Even though modern digital
electronic accelerators are gradually adopting 2.5D architectures with multiple
smaller chiplets to improve scalability, they face fundamental limitations due
to a reliance on slow metallic interconnects. This paper outlines how optical
communication and computation can be leveraged in 2.5D platforms to realize
energy-efficient and high throughput 2.5D ML accelerator architectures.
Related papers
- Joint Transmit and Pinching Beamforming for PASS: Optimization-Based or Learning-Based? [89.05848771674773]
A novel antenna system ()-enabled downlink multi-user multiple-input single-output (MISO) framework is proposed.
It consists of multiple waveguides, which equip numerous low-cost antennas, named (PAs)
The positions of PAs can be reconfigured to both spanning large-scale path and space.
arXiv Detail & Related papers (2025-02-12T18:54:10Z) - Exploring the Potential of Wireless-enabled Multi-Chip AI Accelerators [2.2305608711864555]
We show that wireless interconnects can lead to speedups of 10% on average and 20% maximum.
We highlight the importance of load balancing between the wired and wireless interconnects.
arXiv Detail & Related papers (2025-01-29T11:00:09Z) - Dynamic Spectrum Access for Ambient Backscatter Communication-assisted D2D Systems with Quantum Reinforcement Learning [68.63990729719369]
The wireless spectrum is becoming scarce, resulting in low spectral efficiency for D2D communications.
This paper aims to integrate the ambient backscatter communication technology into D2D devices to allow them to backscatter ambient RF signals.
We develop a novel quantum reinforcement learning (RL) algorithm that can achieve a faster convergence rate with fewer training parameters.
arXiv Detail & Related papers (2024-10-23T15:36:43Z) - Accelerating Error Correction Code Transformers [56.75773430667148]
We introduce a novel acceleration method for transformer-based decoders.
We achieve a 90% compression ratio and reduce arithmetic operation energy consumption by at least 224 times on modern hardware.
arXiv Detail & Related papers (2024-10-08T11:07:55Z) - Optical training of large-scale Transformers and deep neural networks with direct feedback alignment [48.90869997343841]
We experimentally implement a versatile and scalable training algorithm, called direct feedback alignment, on a hybrid electronic-photonic platform.
An optical processing unit performs large-scale random matrix multiplications, which is the central operation of this algorithm, at speeds up to 1500 TeraOps.
We study the compute scaling of our hybrid optical approach, and demonstrate a potential advantage for ultra-deep and wide neural networks.
arXiv Detail & Related papers (2024-09-01T12:48:47Z) - ARTEMIS: A Mixed Analog-Stochastic In-DRAM Accelerator for Transformer Neural Networks [2.9699290794642366]
ARTEMIS is a mixed analog-stochastic in-DRAM accelerator for transformer models.
Our analysis indicates that ARTEMIS exhibits at least 3.0x speedup, 1.8x lower energy, and 1.9x better energy efficiency compared to GPU, TPU, CPU, and state-of-the-art PIM transformer hardware accelerators.
arXiv Detail & Related papers (2024-07-17T15:08:14Z) - Accelerating Neural Networks for Large Language Models and Graph
Processing with Silicon Photonics [4.471962177124311]
Large language models (LLMs) and graph processing have emerged as transformative technologies for natural language processing (NLP), computer vision, and graph-structured data applications.
However, the complex structures of these models pose challenges for acceleration on conventional electronic platforms.
We describe novel hardware accelerators based on silicon photonics to accelerate transformer neural networks that are used in LLMs and graph neural networks for graph data processing.
arXiv Detail & Related papers (2024-01-12T20:32:38Z) - Random resistive memory-based deep extreme point learning machine for
unified visual processing [67.51600474104171]
We propose a novel hardware-software co-design, random resistive memory-based deep extreme point learning machine (DEPLM)
Our co-design system achieves huge energy efficiency improvements and training cost reduction when compared to conventional systems.
arXiv Detail & Related papers (2023-12-14T09:46:16Z) - REED: Chiplet-Based Accelerator for Fully Homomorphic Encryption [4.713756093611972]
We present the first-of-its-kind multi-chiplet-based FHE accelerator REED' for overcoming the limitations of prior monolithic designs.
Results demonstrate that REED 2.5D microprocessor consumes 96.7 mm$2$ chip area, 49.4 W average power in 7nm technology.
arXiv Detail & Related papers (2023-08-05T14:04:39Z) - Machine Learning Accelerators in 2.5D Chiplet Platforms with Silicon
Photonics [5.190207094732673]
Domain-specific machine learning (ML) accelerators such as Google's TPU and Apple's Neural Engine now dominate CPUs and GPU for energy-efficient ML processing.
We present a vision of how optical computation and communication can be integrated into 2.5D chiplet platforms to drive an entirely new class of sustainable and scalable ML hardware accelerators.
arXiv Detail & Related papers (2023-01-28T17:06:53Z) - Interleaving: Modular architectures for fault-tolerant photonic quantum
computing [50.591267188664666]
Photonic fusion-based quantum computing (FBQC) uses low-loss photonic delays.
We present a modular architecture for FBQC in which these components are combined to form "interleaving modules"
Exploiting the multiplicative power of delays, each module can add thousands of physical qubits to the computational Hilbert space.
arXiv Detail & Related papers (2021-03-15T18:00:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.