Related papers: Logarithmic Memory Networks (LMNs): Efficient Long-Range Sequence Modeling for Resource-Constrained Environments

Logarithmic Memory Networks (LMNs): Efficient Long-Range Sequence Modeling for Resource-Constrained Environments

URL: http://arxiv.org/abs/2501.07905v1
Date: Tue, 14 Jan 2025 07:50:09 GMT
Title: Logarithmic Memory Networks (LMNs): Efficient Long-Range Sequence Modeling for Resource-Constrained Environments
Authors: Mohamed A. Taha,
Abstract summary: This paper introduces Logarithmic Memory Networks (LMNs), a novel architecture that leverages a hierarchical logarithmic tree structure to efficiently store and retrieve past information.<n>LMNs dynamically summarize historical context, significantly reducing the memory footprint and computational complexity of attention mechanisms.<n>These features make LMNs a robust and scalable solution for processing long-range sequences in resource-constrained environments.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Long-range sequence modeling is a crucial aspect of natural language processing and time series analysis. However, traditional models like Recurrent Neural Networks (RNNs) and Transformers suffer from computational and memory inefficiencies, especially when dealing with long sequences. This paper introduces Logarithmic Memory Networks (LMNs), a novel architecture that leverages a hierarchical logarithmic tree structure to efficiently store and retrieve past information. LMNs dynamically summarize historical context, significantly reducing the memory footprint and computational complexity of attention mechanisms from O(n2) to O(log(n)). The model employs a single-vector, targeted attention mechanism to access stored information, and the memory block construction worker (summarizer) layer operates in two modes: a parallel execution mode during training for efficient processing of hierarchical tree structures and a sequential execution mode during inference, which acts as a memory management system. It also implicitly encodes positional information, eliminating the need for explicit positional encodings. These features make LMNs a robust and scalable solution for processing long-range sequences in resource-constrained environments, offering practical improvements in efficiency and scalability. The code is publicly available under the MIT License on GitHub: https://github.com/AhmedBoin/LogarithmicMemory.

Related papers

MoM: Linear Sequence Modeling with Mixture-of-Memories [9.665802842933209]
We introduce a novel architecture called Mixture-of-Memories (MoM) MoM utilizes multiple independent memory states, with a router network directing input tokens to specific memory states. MoM performs exceptionally well on recall-intensive tasks, surpassing existing linear sequence modeling techniques.
arXiv Detail & Related papers (2025-02-19T12:53:55Z)
Optimal Gradient Checkpointing for Sparse and Recurrent Architectures using Off-Chip Memory [0.8321953606016751]
We introduce memory-efficient gradient checkpointing strategies tailored for the general class of sparse RNNs and Spiking Neural Networks.<n>We find that Double Checkpointing emerges as the most effective method, optimizing the use of local memory resources while minimizing recomputation overhead.
arXiv Detail & Related papers (2024-12-16T14:23:31Z)
B'MOJO: Hybrid State Space Realizations of Foundation Models with Eidetic and Fading Memory [91.81390121042192]
We develop a class of models called B'MOJO to seamlessly combine eidetic and fading memory within an composable module. B'MOJO's ability to modulate eidetic and fading memory results in better inference on longer sequences tested up to 32K tokens.
arXiv Detail & Related papers (2024-07-08T18:41:01Z)
Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs [61.40047491337793]
We present Hierarchical cOntext MERging (HOMER), a new training-free scheme designed to overcome the limitations of large language models. HomeR uses a divide-and-conquer algorithm, dividing long inputs into manageable chunks. A token reduction technique precedes each merging, ensuring memory usage efficiency.
arXiv Detail & Related papers (2024-04-16T06:34:08Z)
Heterogenous Memory Augmented Neural Networks [84.29338268789684]
We introduce a novel heterogeneous memory augmentation approach for neural networks. By introducing learnable memory tokens with attention mechanism, we can effectively boost performance without huge computational overhead. We show our approach on various image and graph-based tasks under both in-distribution (ID) and out-of-distribution (OOD) conditions.
arXiv Detail & Related papers (2023-10-17T01:05:28Z)
Universal Recurrent Event Memories for Streaming Data [0.0]
We propose a new event memory architecture (MemNet) for recurrent neural networks. MemNet stores key-value pairs, which separate the information for addressing and for content. MemNet architecture can be applied without modifications to scalar time series, logic operators on strings, and also to natural language processing.
arXiv Detail & Related papers (2023-07-28T17:40:58Z)
Blockwise Parallel Transformer for Large Context Models [70.97386897478238]
Blockwise Parallel Transformer (BPT) is a blockwise computation of self-attention and feedforward network fusion to minimize memory costs. By processing longer input sequences while maintaining memory efficiency, BPT enables training sequences 32 times longer than vanilla Transformers and up to 4 times longer than previous memory-efficient methods.
arXiv Detail & Related papers (2023-05-30T19:25:51Z)
Memory Efficient Neural Processes via Constant Memory Attention Block [55.82269384896986]
Constant Memory Attentive Neural Processes (CMANPs) are an NP variant that only requires constant memory. We show CMANPs achieve state-of-the-art results on popular NP benchmarks while being significantly more memory efficient than prior methods.
arXiv Detail & Related papers (2023-05-23T23:10:19Z)
Sequence Modeling with Multiresolution Convolutional Memory [27.218134279968062]
We build a new building block for sequence modeling called a MultiresLayer. The key component of our model is the multiresolution convolution, capturing multiscale trends in the input sequence. Our model yields state-of-the-art performance on a number of sequence classification and autoregressive density estimation tasks.
arXiv Detail & Related papers (2023-05-02T17:50:54Z)
NumS: Scalable Array Programming for the Cloud [82.827921577004]
We present NumS, an array programming library which optimize NumPy-like expressions on task-based distributed systems. This is achieved through a novel scheduler called Load Simulated Hierarchical Scheduling (LSHS) We show that LSHS enhances performance on Ray by decreasing network load by a factor of 2x, requiring 4x less memory, and reducing execution time by 10x on the logistic regression problem.
arXiv Detail & Related papers (2022-06-28T20:13:40Z)
Memformer: A Memory-Augmented Transformer for Sequence Modeling [55.780849185884996]
We present Memformer, an efficient neural network for sequence modeling. Our model achieves linear time complexity and constant memory space complexity when processing long sequences.
arXiv Detail & Related papers (2020-10-14T09:03:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.