Machine Learning Accelerators in 2.5D Chiplet Platforms with Silicon
Photonics
- URL: http://arxiv.org/abs/2301.12252v1
- Date: Sat, 28 Jan 2023 17:06:53 GMT
- Title: Machine Learning Accelerators in 2.5D Chiplet Platforms with Silicon
Photonics
- Authors: Febin Sunny, Ebadollah Taheri, Mahdi Nikdast, Sudeep Pasricha
- Abstract summary: Domain-specific machine learning (ML) accelerators such as Google's TPU and Apple's Neural Engine now dominate CPUs and GPU for energy-efficient ML processing.
We present a vision of how optical computation and communication can be integrated into 2.5D chiplet platforms to drive an entirely new class of sustainable and scalable ML hardware accelerators.
- Score: 5.190207094732673
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Domain-specific machine learning (ML) accelerators such as Google's TPU and
Apple's Neural Engine now dominate CPUs and GPUs for energy-efficient ML
processing. However, the evolution of electronic accelerators is facing
fundamental limits due to the limited computation density of monolithic
processing chips and the reliance on slow metallic interconnects. In this
paper, we present a vision of how optical computation and communication can be
integrated into 2.5D chiplet platforms to drive an entirely new class of
sustainable and scalable ML hardware accelerators. We describe how cross-layer
design and fabrication of optical devices, circuits, and architectures, and
hardware/software codesign can help design efficient photonics-based 2.5D
chiplet platforms to accelerate emerging ML workloads.
Related papers
- Performance and Power: Systematic Evaluation of AI Workloads on Accelerators with CARAML [0.0]
The CARAML benchmark suite is employed to assess performance and energy consumption during the training of large language models and computer vision models.
CarAML provides a compact, automated, and reproducible framework for assessing the performance and energy of ML workloads.
arXiv Detail & Related papers (2024-09-19T12:43:18Z) - OPIMA: Optical Processing-In-Memory for Convolutional Neural Network Acceleration [5.0389804644646174]
We introduce OPIMA, a processing-in-memory (PIM)-based machine learning accelerator.
PIM struggles to achieve high throughput and energy efficiency due to internal data movement bottlenecks.
We show that OPIMA can achieve 2.98x higher throughput and 137x better energy efficiency than the best-known prior work.
arXiv Detail & Related papers (2024-07-11T06:12:04Z) - Silicon Photonic 2.5D Interposer Networks for Overcoming Communication
Bottlenecks in Scale-out Machine Learning Hardware Accelerators [5.482420806459269]
Modern machine learning (ML) applications are becoming increasingly complex and monolithic (single chip) accelerator architectures cannot keep up with their energy efficiency and throughput demands.
This paper outlines how optical communication and computation can be leveraged in 2.5D platforms to realize energy-efficient and high throughput 2.5D ML accelerator architectures.
arXiv Detail & Related papers (2024-03-07T03:38:35Z) - Accelerating Neural Networks for Large Language Models and Graph
Processing with Silicon Photonics [4.471962177124311]
Large language models (LLMs) and graph processing have emerged as transformative technologies for natural language processing (NLP), computer vision, and graph-structured data applications.
However, the complex structures of these models pose challenges for acceleration on conventional electronic platforms.
We describe novel hardware accelerators based on silicon photonics to accelerate transformer neural networks that are used in LLMs and graph neural networks for graph data processing.
arXiv Detail & Related papers (2024-01-12T20:32:38Z) - Random resistive memory-based deep extreme point learning machine for
unified visual processing [67.51600474104171]
We propose a novel hardware-software co-design, random resistive memory-based deep extreme point learning machine (DEPLM)
Our co-design system achieves huge energy efficiency improvements and training cost reduction when compared to conventional systems.
arXiv Detail & Related papers (2023-12-14T09:46:16Z) - FusionAI: Decentralized Training and Deploying LLMs with Massive
Consumer-Level GPUs [57.12856172329322]
We envision a decentralized system unlocking the potential vast untapped consumer-level GPU.
This system faces critical challenges, including limited CPU and GPU memory, low network bandwidth, the variability of peer and device heterogeneity.
arXiv Detail & Related papers (2023-09-03T13:27:56Z) - Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures [67.47328776279204]
This work introduces a framework to develop efficient, portable Deep Learning and High Performance Computing kernels.
We decompose the kernel development in two steps: 1) Expressing the computational core using Processing Primitives (TPPs) and 2) Expressing the logical loops around TPPs in a high-level, declarative fashion.
We demonstrate the efficacy of our approach using standalone kernels and end-to-end workloads that outperform state-of-the-art implementations on diverse CPU platforms.
arXiv Detail & Related papers (2023-04-25T05:04:44Z) - Fast GraspNeXt: A Fast Self-Attention Neural Network Architecture for
Multi-task Learning in Computer Vision Tasks for Robotic Grasping on the Edge [80.88063189896718]
High architectural and computational complexity can result in poor suitability for deployment on embedded devices.
Fast GraspNeXt is a fast self-attention neural network architecture tailored for embedded multi-task learning in computer vision tasks for robotic grasping.
arXiv Detail & Related papers (2023-04-21T18:07:14Z) - SeLoC-ML: Semantic Low-Code Engineering for Machine Learning
Applications in Industrial IoT [9.477629856092218]
This paper presents a framework called Semantic Low-Code Engineering for ML Applications (SeLoC-ML)
SeLoC-ML enables non-experts to model, discover, reuse, and matchmake ML models and devices at scale.
Developers can benefit from semantic application templates, called recipes, to fast prototype end-user applications.
arXiv Detail & Related papers (2022-07-18T13:06:21Z) - FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task.
The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources.
It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z) - Interleaving: Modular architectures for fault-tolerant photonic quantum
computing [50.591267188664666]
Photonic fusion-based quantum computing (FBQC) uses low-loss photonic delays.
We present a modular architecture for FBQC in which these components are combined to form "interleaving modules"
Exploiting the multiplicative power of delays, each module can add thousands of physical qubits to the computational Hilbert space.
arXiv Detail & Related papers (2021-03-15T18:00:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.