Towards 3D Acceleration for low-power Mixture-of-Experts and Multi-Head Attention Spiking Transformers
- URL: http://arxiv.org/abs/2412.05540v1
- Date: Sat, 07 Dec 2024 05:15:05 GMT
- Title: Towards 3D Acceleration for low-power Mixture-of-Experts and Multi-Head Attention Spiking Transformers
- Authors: Boxun Xu, Junyoung Hwang, Pruek Vanna-iampikul, Yuxuan Yin, Sung Kyu Lim, Peng Li,
- Abstract summary: Spiking Neural Networks(SNNs) provide a brain-inspired and event-driven mechanism that is believed to be critical to unlock energy-efficient deep learning.
This paper introduces the first 3D hardware architecture and design methodology for Mixture-of-Experts and Multi-Head Attention spiking transformers.
- Score: 5.1210823165448
- License:
- Abstract: Spiking Neural Networks(SNNs) provide a brain-inspired and event-driven mechanism that is believed to be critical to unlock energy-efficient deep learning. The mixture-of-experts approach mirrors the parallel distributed processing of nervous systems, introducing conditional computation policies and expanding model capacity without scaling up the number of computational operations. Additionally, spiking mixture-of-experts self-attention mechanisms enhance representation capacity, effectively capturing diverse patterns of entities and dependencies between visual or linguistic tokens. However, there is currently a lack of hardware support for highly parallel distributed processing needed by spiking transformers, which embody a brain-inspired computation. This paper introduces the first 3D hardware architecture and design methodology for Mixture-of-Experts and Multi-Head Attention spiking transformers. By leveraging 3D integration with memory-on-logic and logic-on-logic stacking, we explore such brain-inspired accelerators with spatially stackable circuitry, demonstrating significant optimization of energy efficiency and latency compared to conventional 2D CMOS integration.
Related papers
- DSV: Exploiting Dynamic Sparsity to Accelerate Large-Scale Video DiT Training [85.04885553561164]
Diffusion Transformers (DiTs) have shown remarkable performance in modeling and generating high-quality videos.
This paper introduces DSV, a novel framework designed to accelerate and scale the training of video DiTs.
arXiv Detail & Related papers (2025-02-11T14:39:59Z) - MAS-Attention: Memory-Aware Stream Processing for Attention Acceleration on Resource-Constrained Edge Devices [24.1144641404561]
We propose a scheme for exact attention inference acceleration on memory-constrained edge accelerators.
We show up to 2.75x speedup and 54% reduction in energy consumption as compared to the state-of-the-art attention fusion method (FLAT) in the edge computing scenario.
arXiv Detail & Related papers (2024-11-20T19:44:26Z) - Spiking Transformer Hardware Accelerators in 3D Integration [5.426379844893919]
Spiking neural networks (SNNs) are powerful models of computation and are well suited for on resource-constrained edge devices and neuromorphic hardware.
Recently emerged showcased spiking transformers have promising performance and efficiency by capitalizing on the binary nature of spiking operations.
arXiv Detail & Related papers (2024-11-11T22:08:11Z) - Topology Optimization of Random Memristors for Input-Aware Dynamic SNN [44.38472635536787]
We introduce pruning optimization for input-aware dynamic memristive spiking neural network (PRIME)
Signal representation-wise, PRIME employs leaky integrate-and-fire neurons to emulate the brain's inherent spiking mechanism.
For reconfigurability, inspired by the brain's dynamic adjustment of computational depth, PRIME employs an input-aware dynamic early stop policy.
arXiv Detail & Related papers (2024-07-26T09:35:02Z) - Learning local equivariant representations for quantum operators [7.377639942466163]
We introduce a novel deep learning model, SLEM, for predicting multiple quantum operators.
SLEM achieves state-of-the-art accuracy while dramatically improving computational efficiency.
We demonstrate SLEM's capabilities across diverse 2D and 3D materials, achieving high accuracy even with limited training data.
arXiv Detail & Related papers (2024-07-08T15:55:12Z) - Efficient and accurate neural field reconstruction using resistive memory [52.68088466453264]
Traditional signal reconstruction methods on digital computers face both software and hardware challenges.
We propose a systematic approach with software-hardware co-optimizations for signal reconstruction from sparse inputs.
This work advances the AI-driven signal restoration technology and paves the way for future efficient and robust medical AI and 3D vision applications.
arXiv Detail & Related papers (2024-04-15T09:33:09Z) - Resistive Memory-based Neural Differential Equation Solver for Score-based Diffusion Model [55.116403765330084]
Current AIGC methods, such as score-based diffusion, are still deficient in terms of rapidity and efficiency.
We propose a time-continuous and analog in-memory neural differential equation solver for score-based diffusion.
We experimentally validate our solution with 180 nm resistive memory in-memory computing macros.
arXiv Detail & Related papers (2024-04-08T16:34:35Z) - Random resistive memory-based deep extreme point learning machine for
unified visual processing [67.51600474104171]
We propose a novel hardware-software co-design, random resistive memory-based deep extreme point learning machine (DEPLM)
Our co-design system achieves huge energy efficiency improvements and training cost reduction when compared to conventional systems.
arXiv Detail & Related papers (2023-12-14T09:46:16Z) - ETLP: Event-based Three-factor Local Plasticity for online learning with
neuromorphic hardware [105.54048699217668]
We show a competitive performance in accuracy with a clear advantage in the computational complexity for Event-Based Three-factor Local Plasticity (ETLP)
We also show that when using local plasticity, threshold adaptation in spiking neurons and a recurrent topology are necessary to learntemporal patterns with a rich temporal structure.
arXiv Detail & Related papers (2023-01-19T19:45:42Z) - Deep learning applied to computational mechanics: A comprehensive
review, state of the art, and the classics [77.34726150561087]
Recent developments in artificial neural networks, particularly deep learning (DL), are reviewed in detail.
Both hybrid and pure machine learning (ML) methods are discussed.
History and limitations of AI are recounted and discussed, with particular attention at pointing out misstatements or misconceptions of the classics.
arXiv Detail & Related papers (2022-12-18T02:03:00Z) - Bottom-up and top-down approaches for the design of neuromorphic
processing systems: Tradeoffs and synergies between natural and artificial
intelligence [3.874729481138221]
Moore's law has driven exponential computing power expectations, its nearing end calls for new avenues for improving the overall system performance.
One of these avenues is the exploration of alternative brain-inspired computing architectures that aim at achieving the flexibility and computational efficiency of biological neural processing systems.
We provide a comprehensive overview of the field, highlighting the different levels of granularity at which this paradigm shift is realized.
arXiv Detail & Related papers (2021-06-02T16:51:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.