A Compute and Communication Runtime Model for Loihi 2
- URL: http://arxiv.org/abs/2601.10035v1
- Date: Thu, 15 Jan 2026 03:27:47 GMT
- Title: A Compute and Communication Runtime Model for Loihi 2
- Authors: Jonathan Timcheck, Alessandro Pierro, Sumit Bam Shrestha,
- Abstract summary: We introduce the first max-affine lower-bound runtime model for Intel's Loihi 2 neuromorphic chip.<n>Despite being a lower-bound model, we observe a tight correspondence between our model's estimated runtime and the measured runtime.<n>Our model helps empower the design of high-speed algorithms and kernels for Loihi 2.
- Score: 43.830725925222424
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neuromorphic computers hold the potential to vastly improve the speed and efficiency of a wide range of computational kernels with their asynchronous, compute-memory co-located, spatially distributed, and scalable nature. However, performance models that are simple yet sufficiently expressive to predict runtime on actual neuromorphic hardware are lacking, posing a challenge for researchers and developers who strive to design fast algorithms and kernels. As breaking the memory bandwidth wall of conventional von-Neumann architectures is a primary neuromorphic advantage, modeling communication time is especially important. At the same time, modeling communication time is difficult, as complex congestion patterns arise in a heavily-loaded Network-on-Chip. In this work, we introduce the first max-affine lower-bound runtime model -- a multi-dimensional roofline model -- for Intel's Loihi 2 neuromorphic chip that quantitatively accounts for both compute and communication based on a suite of microbenchmarks. Despite being a lower-bound model, we observe a tight correspondence (Pearson correlation coefficient greater than or equal to 0.97) between our model's estimated runtime and the measured runtime on Loihi 2 for a neural network linear layer, i.e., matrix-vector multiplication, and for an example application, a Quadratic Unconstrained Binary Optimization solver. Furthermore, we derive analytical expressions for communication-bottlenecked runtime to study scalability of the linear layer, revealing an area-runtime tradeoff for different spatial workload configurations with linear to superliner runtime scaling in layer size with a variety of constant factors. Our max-affine runtime model helps empower the design of high-speed algorithms and kernels for Loihi 2.
Related papers
- SpikySpace: A Spiking State Space Model for Energy-Efficient Time Series Forecasting [9.976522013586244]
SpikySpace is a spiking state-space model that reduces the quadratic cost in the attention block to linear time via selective scanning.<n>Because complex operations such as exponentials and divisions are costly on neuromorphic chips, we introduce simplified approximations of SiLU and Softplus.<n>In matched settings, SpikySpace reduces estimated energy consumption by 98.73% and 96.24% compared to two state-of-the-art transformer based approaches.
arXiv Detail & Related papers (2026-01-02T13:10:53Z) - Sequential-Parallel Duality in Prefix Scannable Models [68.39855814099997]
Recent developments have given rise to various models, such as Gated Linear Attention (GLA) and Mamba.<n>This raises a natural question: can we characterize the full class of neural sequence models that support near-constant-time parallel evaluation and linear-time, constant-space sequential inference?
arXiv Detail & Related papers (2025-06-12T17:32:02Z) - Generalizable, real-time neural decoding with hybrid state-space models [19.90083268518243]
We present POSSM, a novel hybrid architecture that combines individual spike tokenization via a cross-attention module with a recurrent state-space model (SSM) backbone.<n>We evaluate POSSM's decoding performance and inference speed on intracortical decoding of monkey motor tasks, and show that it extends to clinical applications.<n>In all of these tasks, we find that POSSM achieves decoding accuracy comparable to state-of-the-art Transformers, at a fraction of the inference cost.
arXiv Detail & Related papers (2025-06-05T17:57:08Z) - MesaNet: Sequence Modeling by Locally Optimal Test-Time Training [67.45211108321203]
We introduce a numerically stable, chunkwise parallelizable version of the recently proposed Mesa layer.<n>We show that optimal test-time training enables reaching lower language modeling perplexity and higher downstream benchmark performance than previous RNNs.
arXiv Detail & Related papers (2025-06-05T16:50:23Z) - Lean Attention: Hardware-Aware Scalable Attention Mechanism for the Decode-Phase of Transformers [4.674454841332859]
Transformer-based models have emerged as one of the most widely used architectures for natural language processing.<n>These huge models are memory hungry and incur significant inference latency even on cutting edge AI-accelerators.<n>We propose LeanAttention, a scalable technique of computing self-attention for the token-generation phase.
arXiv Detail & Related papers (2024-05-17T00:52:39Z) - Runtime Construction of Large-Scale Spiking Neuronal Network Models on
GPU Devices [0.0]
We propose a new method for creating network connections interactively, dynamically, and directly in GPU memory.
We validate the simulation performance with both consumer and data center GPUs on two neuroscientifically relevant models.
Both network construction and simulation times are comparable or shorter than those obtained with other state-of-the-art simulation technologies.
arXiv Detail & Related papers (2023-06-16T14:08:27Z) - Binary Graph Neural Networks [69.51765073772226]
Graph Neural Networks (GNNs) have emerged as a powerful and flexible framework for representation learning on irregular data.
In this paper, we present and evaluate different strategies for the binarization of graph neural networks.
We show that through careful design of the models, and control of the training process, binary graph neural networks can be trained at only a moderate cost in accuracy on challenging benchmarks.
arXiv Detail & Related papers (2020-12-31T18:48:58Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z) - Convolutional Tensor-Train LSTM for Spatio-temporal Learning [116.24172387469994]
We propose a higher-order LSTM model that can efficiently learn long-term correlations in the video sequence.
This is accomplished through a novel tensor train module that performs prediction by combining convolutional features across time.
Our results achieve state-of-the-art performance-art in a wide range of applications and datasets.
arXiv Detail & Related papers (2020-02-21T05:00:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.