Enhancing Biologically Inspired Hierarchical Temporal Memory with Hardware-Accelerated Reflex Memory
- URL: http://arxiv.org/abs/2504.03746v1
- Date: Tue, 01 Apr 2025 17:40:12 GMT
- Title: Enhancing Biologically Inspired Hierarchical Temporal Memory with Hardware-Accelerated Reflex Memory
- Authors: Pavia Bera, Sabrina Hassan Moon, Jennifer Adorno, Dayane Alfenas Reis, Sanjukta Bhanja,
- Abstract summary: This paper introduces a Reflex Memory (RM) block, inspired by the Spinal Cord's working mechanisms, to accelerate the processing of first-order inferences.<n>The integration of RM with HTM forms a system called the Accelerated Hierarchical Temporal Memory (AHTM), which processes repetitive information more efficiently.<n>Compared to the original algorithm AHTM, AHTM accelerates inference by up to 7.55x, while H-AHTM further enhances performance with a 10.10x speedup.
- Score: 0.29127054707887967
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rapid expansion of the Internet of Things (IoT) generates zettabytes of data that demand efficient unsupervised learning systems. Hierarchical Temporal Memory (HTM), a third-generation unsupervised AI algorithm, models the neocortex of the human brain by simulating columns of neurons to process and predict sequences. These neuron columns can memorize and infer sequences across multiple orders. While multiorder inferences offer robust predictive capabilities, they often come with significant computational overhead. The Sequence Memory (SM) component of HTM, which manages these inferences, encounters bottlenecks primarily due to its extensive programmable interconnects. In many cases, it has been observed that first-order temporal relationships have proven to be sufficient without any significant loss in efficiency. This paper introduces a Reflex Memory (RM) block, inspired by the Spinal Cord's working mechanisms, designed to accelerate the processing of first-order inferences. The RM block performs these inferences significantly faster than the SM. The integration of RM with HTM forms a system called the Accelerated Hierarchical Temporal Memory (AHTM), which processes repetitive information more efficiently than the original HTM while still supporting multiorder inferences. The experimental results demonstrate that the HTM predicts an event in 0.945 s, whereas the AHTM module does so in 0.125 s. Additionally, the hardware implementation of RM in a content-addressable memory (CAM) block, known as Hardware-Accelerated Hierarchical Temporal Memory (H-AHTM), predicts an event in just 0.094 s, significantly improving inference speed. Compared to the original algorithm \cite{bautista2020matlabhtm}, AHTM accelerates inference by up to 7.55x, while H-AHTM further enhances performance with a 10.10x speedup.
Related papers
- Sparse-dLLM: Accelerating Diffusion LLMs with Dynamic Cache Eviction [58.044803442346115]
Diffusion Large Language Models (dLLMs) enable breakthroughs in reasoning and parallel decoding but suffer from prohibitive computational complexity and memory overhead during inference.<n>We propose Sparse-dLLM, the first training-free framework integrating dynamic cache eviction with sparse attention via delayed bidirectional sparse caching.
arXiv Detail & Related papers (2025-08-04T16:14:03Z) - Systolic Array-based Accelerator for Structured State-Space Models [1.137896937254823]
State-Space Models (SSMs) process very long data sequences more efficiently than recurrent and Transformer-based models.<n>In this paper, we introduce a specialized hardware accelerator, EpochCore, for accelerating SSMs.<n>EpochCore achieves on average 2000x improvement in performance on LRA datasets compared to a GPU.
arXiv Detail & Related papers (2025-07-29T00:01:57Z) - mGRADE: Minimal Recurrent Gating Meets Delay Convolutions for Lightweight Sequence Modeling [0.5236468296934584]
mGRADE is a hybrid-memory system that integrates a temporal 1D-convolution with learnable spacings followed by a minimal gated recurrent unit.<n>We demonstrate that mGRADE effectively separates and preserves multi-scale temporal features.<n>This highlights mGRADE's promise as an efficient solution for memory-constrained multi-scale temporal processing at the edge.
arXiv Detail & Related papers (2025-07-02T15:44:35Z) - Lattice: Learning to Efficiently Compress the Memory [13.765057453744427]
This paper introduces Lattice, a novel recurrent neural network (RNN) mechanism that efficiently compress the cache into a fixed number of memory slots.
We formulate this compression as an online optimization problem and derive a dynamic memory update rule based on a single gradient descent step.
The experimental results show that Lattice achieves the best perplexity compared to all baselines across diverse context lengths.
arXiv Detail & Related papers (2025-04-08T03:48:43Z) - Efficient and accurate neural field reconstruction using resistive memory [52.68088466453264]
Traditional signal reconstruction methods on digital computers face both software and hardware challenges.
We propose a systematic approach with software-hardware co-optimizations for signal reconstruction from sparse inputs.
This work advances the AI-driven signal restoration technology and paves the way for future efficient and robust medical AI and 3D vision applications.
arXiv Detail & Related papers (2024-04-15T09:33:09Z) - Resistive Memory-based Neural Differential Equation Solver for Score-based Diffusion Model [55.116403765330084]
Current AIGC methods, such as score-based diffusion, are still deficient in terms of rapidity and efficiency.
We propose a time-continuous and analog in-memory neural differential equation solver for score-based diffusion.
We experimentally validate our solution with 180 nm resistive memory in-memory computing macros.
arXiv Detail & Related papers (2024-04-08T16:34:35Z) - Topology-aware Embedding Memory for Continual Learning on Expanding Networks [63.35819388164267]
We present a framework to tackle the memory explosion problem using memory replay techniques.
PDGNNs with Topology-aware Embedding Memory (TEM) significantly outperform state-of-the-art techniques.
arXiv Detail & Related papers (2024-01-24T03:03:17Z) - Neuromorphic Computing with AER using Time-to-Event-Margin Propagation [7.730429080477441]
We show how causal temporal primitives like delay, triggering, and sorting inherent in the AER protocol can be exploited for scalable neuromorphic computing.
The proposed TEMP-based AER architecture is fully asynchronous and relies on interconnect delays for memory and computing.
As a proof-of-concept, we show that a trained TEMP-based convolutional neural network (CNN) can demonstrate an accuracy greater than 99% on the MNIST dataset.
arXiv Detail & Related papers (2023-04-27T02:01:54Z) - MF-NeRF: Memory Efficient NeRF with Mixed-Feature Hash Table [62.164549651134465]
We propose MF-NeRF, a memory-efficient NeRF framework that employs a Mixed-Feature hash table to improve memory efficiency and reduce training time while maintaining reconstruction quality.
Our experiments with state-of-the-art Instant-NGP, TensoRF, and DVGO, indicate our MF-NeRF could achieve the fastest training time on the same GPU hardware with similar or even higher reconstruction quality.
arXiv Detail & Related papers (2023-04-25T05:44:50Z) - GLEAM: Greedy Learning for Large-Scale Accelerated MRI Reconstruction [50.248694764703714]
Unrolled neural networks have recently achieved state-of-the-art accelerated MRI reconstruction.
These networks unroll iterative optimization algorithms by alternating between physics-based consistency and neural-network based regularization.
We propose Greedy LEarning for Accelerated MRI reconstruction, an efficient training strategy for high-dimensional imaging settings.
arXiv Detail & Related papers (2022-07-18T06:01:29Z) - Memory-Guided Semantic Learning Network for Temporal Sentence Grounding [55.31041933103645]
We propose a memory-augmented network that learns and memorizes the rarely appeared content in TSG tasks.
MGSL-Net consists of three main parts: a cross-modal inter-action module, a memory augmentation module, and a heterogeneous attention module.
arXiv Detail & Related papers (2022-01-03T02:32:06Z) - Sequential memory improves sample and memory efficiency in Episodic Control [0.0]
State of the art deep reinforcement learning algorithms are sample inefficient due to the large number of episodes they require to achieve performance.
ERL algorithms, inspired by the mammalian hippocampus, typically use extended memory systems to bootstrap learning from past events to overcome this sample-inefficiency problem.
Here, we demonstrate that including a bias in the acquired memory content derived from the order of episodic sampling improves both the sample and memory efficiency of an episodic control algorithm.
arXiv Detail & Related papers (2021-12-29T18:42:15Z) - Boosting Mobile CNN Inference through Semantic Memory [12.45440733435801]
We develop a semantic memory design to improve on-device CNN inference.
SMTM employs a hierarchical memory architecture to leverage the long-tail distribution of objects of interest.
It can significantly speed up the model inference over standard approach (up to 2X) and prior cache designs (up to 1.5X), with acceptable accuracy loss.
arXiv Detail & Related papers (2021-12-05T18:18:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.