Differentiable Random Access Memory using Lattices
- URL: http://arxiv.org/abs/2107.03474v1
- Date: Wed, 7 Jul 2021 20:55:42 GMT
- Title: Differentiable Random Access Memory using Lattices
- Authors: Adam P. Goucher, Rajan Troll
- Abstract summary: We introduce a differentiable random access memory module with $O(1)$ performance regardless of size.
The design stores entries on points of a chosen lattice to calculate nearest neighbours of arbitrary points efficiently by exploiting symmetries.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce a differentiable random access memory module with $O(1)$
performance regardless of size, scaling to billions of entries. The design
stores entries on points of a chosen lattice to calculate nearest neighbours of
arbitrary points efficiently by exploiting symmetries. Augmenting a standard
neural network architecture with a single memory layer based on this, we can
scale the parameter count up to memory limits with negligible computational
overhead, giving better accuracy at similar cost. On large language modelling
tasks, these enhanced models with larger capacity significantly outperform the
unmodified transformer baseline. We found continued scaling with memory size up
to the limits tested.
Related papers
- Memory Layers at Scale [67.00854080570979]
This work takes memory layers beyond proof-of-concept, proving their utility at contemporary scale.
On downstream tasks, language models augmented with our improved memory layer outperform dense models with more than twice the budget, as well as mixture-of-expert models when matched for both compute and parameters.
We provide a fully parallelizable memory layer implementation, demonstrating scaling laws with up to 128B memory parameters, pretrained to 1 trillion tokens, comparing to base models with up to 8B parameters.
arXiv Detail & Related papers (2024-12-12T23:56:57Z) - Ultra-Sparse Memory Network [8.927205198458994]
This work introduces UltraMem, incorporating large-scale, ultra-sparse memory layer to address these limitations.
We show that our method achieves state-of-the-art inference speed and model performance within a given computational budget.
arXiv Detail & Related papers (2024-11-19T09:24:34Z) - Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss [59.835032408496545]
We propose a tile-based strategy that partitions the contrastive loss calculation into arbitrary small blocks.
We also introduce a multi-level tiling strategy to leverage the hierarchical structure of distributed systems.
Compared to SOTA memory-efficient solutions, it achieves a two-order-of-magnitude reduction in memory while maintaining comparable speed.
arXiv Detail & Related papers (2024-10-22T17:59:30Z) - RecurrentGemma: Moving Past Transformers for Efficient Open Language Models [103.59785165735727]
We introduce RecurrentGemma, a family of open language models using Google's novel Griffin architecture.
Griffin combines linear recurrences with local attention to achieve excellent performance on language.
We provide two sizes of models, containing 2B and 9B parameters, and provide pre-trained and instruction tuned variants for both.
arXiv Detail & Related papers (2024-04-11T15:27:22Z) - SPARTAN: Sparse Hierarchical Memory for Parameter-Efficient Transformers [29.721162097790646]
SPARTAN is a parameter efficient (PE) and computationally fast architecture for edge devices.
It adds hierarchically organized sparse memory after each Transformer layer.
It can be trained 34% faster in a few-shot setting, while performing within 0.9 points of adapters.
arXiv Detail & Related papers (2022-11-29T23:59:20Z) - A Model or 603 Exemplars: Towards Memory-Efficient Class-Incremental
Learning [56.450090618578]
Class-Incremental Learning (CIL) aims to train a model with limited memory size to meet this requirement.
We show that when counting the model size into the total budget and comparing methods with aligned memory size, saving models do not consistently work.
We propose a simple yet effective baseline, denoted as MEMO for Memory-efficient Expandable MOdel.
arXiv Detail & Related papers (2022-05-26T08:24:01Z) - Experimentally realized memristive memory augmented neural network [0.0]
Lifelong on-device learning is a key challenge for machine intelligence.
Memory augmented neural network has been proposed to achieve the goal, but the memory module has to be stored in an off-chip memory.
We implement the entire memory augmented neural network architecture in a fully integrated memristive crossbar platform.
arXiv Detail & Related papers (2022-04-15T11:52:30Z) - LaMemo: Language Modeling with Look-Ahead Memory [50.6248714811912]
We propose Look-Ahead Memory (LaMemo) that enhances the recurrence memory by incrementally attending to the right-side tokens.
LaMemo embraces bi-directional attention and segment recurrence with an additional overhead only linearly proportional to the memory length.
Experiments on widely used language modeling benchmarks demonstrate its superiority over the baselines equipped with different types of memory.
arXiv Detail & Related papers (2022-04-15T06:11:25Z) - Kanerva++: extending The Kanerva Machine with differentiable, locally
block allocated latent memory [75.65949969000596]
Episodic and semantic memory are critical components of the human memory model.
We develop a new principled Bayesian memory allocation scheme that bridges the gap between episodic and semantic memory.
We demonstrate that this allocation scheme improves performance in memory conditional image generation.
arXiv Detail & Related papers (2021-02-20T18:40:40Z) - CNN with large memory layers [2.368995563245609]
This work is centred around the recently proposed product key memory structure citelarge_memory, implemented for a number of computer vision applications.
The memory structure can be regarded as a simple computation primitive suitable to be augmented to nearly all neural network architectures.
arXiv Detail & Related papers (2021-01-27T20:58:20Z) - Memformer: A Memory-Augmented Transformer for Sequence Modeling [55.780849185884996]
We present Memformer, an efficient neural network for sequence modeling.
Our model achieves linear time complexity and constant memory space complexity when processing long sequences.
arXiv Detail & Related papers (2020-10-14T09:03:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.