Related papers: Implementing and Benchmarking the Locally Competitive Algorithm on the Loihi 2 Neuromorphic Processor

Implementing and Benchmarking the Locally Competitive Algorithm on the Loihi 2 Neuromorphic Processor

URL: http://arxiv.org/abs/2307.13762v1
Date: Tue, 25 Jul 2023 18:43:08 GMT
Title: Implementing and Benchmarking the Locally Competitive Algorithm on the Loihi 2 Neuromorphic Processor
Authors: Gavin Parpart, Sumedh R. Risbud, Garrett T. Kenyon, Yijing Watkins
Abstract summary: Locally Competitive Algorithm (LCA) has been utilized for power efficient sparse coding on neuromorphic processors. LCA on Loihi 2 is orders of magnitude more efficient and faster for large sparsity penalties, while maintaining similar reconstruction quality.
Score: 5.352699766206807
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Neuromorphic processors have garnered considerable interest in recent years for their potential in energy-efficient and high-speed computing. The Locally Competitive Algorithm (LCA) has been utilized for power efficient sparse coding on neuromorphic processors, including the first Loihi processor. With the Loihi 2 processor enabling custom neuron models and graded spike communication, more complex implementations of LCA are possible. We present a new implementation of LCA designed for the Loihi 2 processor and perform an initial set of benchmarks comparing it to LCA on CPU and GPU devices. In these experiments LCA on Loihi 2 is orders of magnitude more efficient and faster for large sparsity penalties, while maintaining similar reconstruction quality. We find this performance improvement increases as the LCA parameters are tuned towards greater representation sparsity. Our study highlights the potential of neuromorphic processors, particularly Loihi 2, in enabling intelligent, autonomous, real-time processing on small robots, satellites where there are strict SWaP (small, lightweight, and low power) requirements. By demonstrating the superior performance of LCA on Loihi 2 compared to conventional computing device, our study suggests that Loihi 2 could be a valuable tool in advancing these types of applications. Overall, our study highlights the potential of neuromorphic processors for efficient and accurate data processing on resource-constrained devices.

Related papers

Neuromorphic Principles for Efficient Large Language Models on Intel Loihi 2 [5.213433310722838]
Large language models (LLMs) deliver impressive performance but require large amounts of energy. We present a MatMul-free LLM architecture adapted for Intel's neuromorphic processor, Loihi 2. Our approach leverages Loihi 2's support for low-precision, event-driven computation and stateful processing.
arXiv Detail & Related papers (2025-02-12T02:40:44Z)
Benchmarking Edge AI Platforms for High-Performance ML Inference [0.0]
Edge computing's growing prominence, due to its ability to reduce communication latency and enable real-time processing, is promoting the rise of high-performance, heterogeneous System-on-Chip solutions. While current approaches often involve scaling down modern hardware, the performance characteristics of neural network workloads can vary significantly. We compare the latency and throughput of various linear algebra and neural network inference tasks across CPU-only, CPU/GPU, and CPU/NPU integrated solutions.
arXiv Detail & Related papers (2024-09-23T08:27:27Z)
Fast Algorithms and Implementations for Computing the Minimum Distance of Quantum Codes [43.96687298077534]
The distance of a stabilizer quantum code determines the number of errors that can be detected and corrected. We present three new fast algorithms and implementations for computing the symplectic distance of the associated classical code.
arXiv Detail & Related papers (2024-08-20T11:24:30Z)
Solving QUBO on the Loihi 2 Neuromorphic Processor [36.40764406612833]
We describe an algorithm for solving Quadratic Unconstrained Binary Optimization problems on the Intel Loihi 2 neuromorphic processor. Preliminary results show that our approach can generate feasible solutions in as little as 1 ms and up to 37x more energy efficient than two baseline solvers running on a CPU.
arXiv Detail & Related papers (2024-08-06T10:07:43Z)
PowerInfer-2: Fast Large Language Model Inference on a Smartphone [4.75185107146461]
Large language models (LLMs) on smartphones enable real-time AI assistance and privacy-preserving, offline operation. This paper introduces PowerInfer-2, a smartphone-based framework that enables fast inference for LLMs exceeding the memory capacity. PowerInfer-2 is the first system to serve a 47B LLM on a smartphone, achieving 11.68 tokens/s.
arXiv Detail & Related papers (2024-06-10T14:01:21Z)
PIM-Opt: Demystifying Distributed Optimization Algorithms on a Real-World Processing-In-Memory System [21.09681871279162]
Modern Machine Learning (ML) training on large-scale datasets is a time-consuming workload. It relies on the optimization algorithm Gradient Descent (SGD) due to its effectiveness, simplicity, and generalization performance. processor-centric architectures suffer from low performance and high energy consumption while executing ML training workloads. Processing-In-Memory (PIM) is a promising solution to alleviate the data movement bottleneck.
arXiv Detail & Related papers (2024-04-10T17:00:04Z)
Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures [67.47328776279204]
This work introduces a framework to develop efficient, portable Deep Learning and High Performance Computing kernels. We decompose the kernel development in two steps: 1) Expressing the computational core using Processing Primitives (TPPs) and 2) Expressing the logical loops around TPPs in a high-level, declarative fashion. We demonstrate the efficacy of our approach using standalone kernels and end-to-end workloads that outperform state-of-the-art implementations on diverse CPU platforms.
arXiv Detail & Related papers (2023-04-25T05:04:44Z)
MAPLE-X: Latency Prediction with Explicit Microprocessor Prior Knowledge [87.41163540910854]
Deep neural network (DNN) latency characterization is a time-consuming process. We propose MAPLE-X which extends MAPLE by incorporating explicit prior knowledge of hardware devices and DNN architecture latency.
arXiv Detail & Related papers (2022-05-25T11:08:20Z)
MAPLE: Microprocessor A Priori for Latency Estimation [81.91509153539566]
Modern deep neural networks must demonstrate state-of-the-art accuracy while exhibiting low latency and energy consumption. Measuring the latency of every evaluated architecture adds a significant amount of time to the NAS process. We propose Microprocessor A Priori for Estimation Estimation MAPLE that does not rely on transfer learning or domain adaptation.
arXiv Detail & Related papers (2021-11-30T03:52:15Z)
Ps and Qs: Quantization-aware pruning for efficient low latency neural network inference [56.24109486973292]
We study the interplay between pruning and quantization during the training of neural networks for ultra low latency applications. We find that quantization-aware pruning yields more computationally efficient models than either pruning or quantization alone for our task.
arXiv Detail & Related papers (2021-02-22T19:00:05Z)
MS-RANAS: Multi-Scale Resource-Aware Neural Architecture Search [94.80212602202518]
We propose Multi-Scale Resource-Aware Neural Architecture Search (MS-RANAS) We employ a one-shot architecture search approach in order to obtain a reduced search cost. We achieve state-of-the-art results in terms of accuracy-speed trade-off.
arXiv Detail & Related papers (2020-09-29T11:56:01Z)
Optimizing Deep Learning Recommender Systems' Training On CPU Cluster Architectures [56.69373580921888]
We focus on Recommender Systems which account for most of the AI cycles in cloud computing centers. By enabling it to run on latest CPU hardware and software tailored for HPC, we are able to achieve more than two-orders of magnitude improvement in performance.
arXiv Detail & Related papers (2020-05-10T14:40:16Z)
Heterogeneous CPU+GPU Stochastic Gradient Descent Algorithms [1.3249453757295084]
We study training algorithms for deep learning on heterogeneous CPU+GPU architectures. Our two-fold objective -- maximize convergence rate and resource utilization simultaneously -- makes the problem challenging. We show that the implementation of these algorithms achieves both faster convergence and higher resource utilization than on several real datasets.
arXiv Detail & Related papers (2020-04-19T05:21:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.