Related papers: Generalized Key-Value Memory to Flexibly Adjust Redundancy in Memory-Augmented Networks

Generalized Key-Value Memory to Flexibly Adjust Redundancy in Memory-Augmented Networks

URL: http://arxiv.org/abs/2203.06223v1
Date: Fri, 11 Mar 2022 19:59:43 GMT
Title: Generalized Key-Value Memory to Flexibly Adjust Redundancy in Memory-Augmented Networks
Authors: Denis Kleyko, Geethan Karunaratne, Jan M. Rabaey, Abu Sebastian, and Abbas Rahimi
Abstract summary: Memory-augmented neural networks enhance a neural network with an external key-value memory. We propose a generalized key-value memory that decouples its dimension from the number of support vectors. We show that adapting this parameter on demand effectively mitigates up to 44% nonidealities, at equal accuracy and number of devices.
Score: 6.03025980398201
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Memory-augmented neural networks enhance a neural network with an external key-value memory whose complexity is typically dominated by the number of support vectors in the key memory. We propose a generalized key-value memory that decouples its dimension from the number of support vectors by introducing a free parameter that can arbitrarily add or remove redundancy to the key memory representation. In effect, it provides an additional degree of freedom to flexibly control the trade-off between robustness and the resources required to store and compute the generalized key-value memory. This is particularly useful for realizing the key memory on in-memory computing hardware where it exploits nonideal, but extremely efficient non-volatile memory devices for dense storage and computation. Experimental results show that adapting this parameter on demand effectively mitigates up to 44% nonidealities, at equal accuracy and number of devices, without any need for neural network retraining.

Related papers

Hardware-software co-exploration with racetrack memory based in-memory computing for CNN inference in embedded systems [54.045712360156024]
racetrack memory is a non-volatile technology that allows high data density fabrication.<n>In-memory arithmetic circuits with memory cells affects both the memory density and power efficiency.<n>We present an efficient in-memory convolutional neural network (CNN) accelerator optimized for use with racetrack memory.
arXiv Detail & Related papers (2025-07-02T07:29:53Z)
CSR:Achieving 1 Bit Key-Value Cache via Sparse Representation [63.65323577445951]
We propose a novel approach called Cache Sparse Representation (CSR) CSR transforms the dense Key-Value cache tensor into sparse indexes and weights, offering a more memory-efficient representation during LLM inference. Our experiments demonstrate CSR achieves performance comparable to state-of-the-art KV cache quantization algorithms.
arXiv Detail & Related papers (2024-12-16T13:01:53Z)
EMN: Brain-inspired Elastic Memory Network for Quick Domain Adaptive Feature Mapping [57.197694698750404]
We propose a novel gradient-free Elastic Memory Network to support quick fine-tuning of the mapping between features and prediction. EMN adopts randomly connected neurons to memorize the association of features and labels, where the signals in the network are propagated as impulses. EMN can achieve up to 10% enhancement of performance while only needing less than 1% timing cost of traditional domain adaptation methods.
arXiv Detail & Related papers (2024-02-04T09:58:17Z)
Heterogenous Memory Augmented Neural Networks [84.29338268789684]
We introduce a novel heterogeneous memory augmentation approach for neural networks. By introducing learnable memory tokens with attention mechanism, we can effectively boost performance without huge computational overhead. We show our approach on various image and graph-based tasks under both in-distribution (ID) and out-of-distribution (OOD) conditions.
arXiv Detail & Related papers (2023-10-17T01:05:28Z)
Universal Recurrent Event Memories for Streaming Data [0.0]
We propose a new event memory architecture (MemNet) for recurrent neural networks. MemNet stores key-value pairs, which separate the information for addressing and for content. MemNet architecture can be applied without modifications to scalar time series, logic operators on strings, and also to natural language processing.
arXiv Detail & Related papers (2023-07-28T17:40:58Z)
MF-NeRF: Memory Efficient NeRF with Mixed-Feature Hash Table [62.164549651134465]
We propose MF-NeRF, a memory-efficient NeRF framework that employs a Mixed-Feature hash table to improve memory efficiency and reduce training time while maintaining reconstruction quality. Our experiments with state-of-the-art Instant-NGP, TensoRF, and DVGO, indicate our MF-NeRF could achieve the fastest training time on the same GPU hardware with similar or even higher reconstruction quality.
arXiv Detail & Related papers (2023-04-25T05:44:50Z)
Mesa: A Memory-saving Training Framework for Transformers [58.78933015299703]
We present Mesa, a memory-saving training framework for Transformers. Mesa uses exact activations during forward pass while storing a low-precision version of activations to reduce memory consumption during training. Experiments on ImageNet, CIFAR-100 and ADE20K demonstrate that Mesa can reduce half of the memory footprints during training.
arXiv Detail & Related papers (2021-11-22T11:23:01Z)
MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning [72.80896338009579]
We find that the memory bottleneck is due to the imbalanced memory distribution in convolutional neural network (CNN) designs. We propose a generic patch-by-patch inference scheduling, which significantly cuts down the peak memory. We automate the process with neural architecture search to jointly optimize the neural architecture and inference scheduling, leading to MCUNetV2.
arXiv Detail & Related papers (2021-10-28T17:58:45Z)
Neural Network Compression for Noisy Storage Devices [71.4102472611862]
Conventionally, model compression and physical storage are decoupled. This approach forces the storage to treat each bit of the compressed model equally, and to dedicate the same amount of resources to each bit. We propose a radically different approach that: (i) employs analog memories to maximize the capacity of each memory cell, and (ii) jointly optimize model compression and physical storage to maximize memory utility.
arXiv Detail & Related papers (2021-02-15T18:19:07Z)
CNN with large memory layers [2.368995563245609]
This work is centred around the recently proposed product key memory structure citelarge_memory, implemented for a number of computer vision applications. The memory structure can be regarded as a simple computation primitive suitable to be augmented to nearly all neural network architectures.
arXiv Detail & Related papers (2021-01-27T20:58:20Z)
Memformer: A Memory-Augmented Transformer for Sequence Modeling [55.780849185884996]
We present Memformer, an efficient neural network for sequence modeling. Our model achieves linear time complexity and constant memory space complexity when processing long sequences.
arXiv Detail & Related papers (2020-10-14T09:03:36Z)
Robust High-dimensional Memory-augmented Neural Networks [13.82206983716435]
Memory-augmented neural networks enhance neural networks with an explicit memory to overcome these issues. Access to this explicit memory occurs via soft read and write operations involving every individual memory entry. We propose a robust architecture that employs a computational memory unit as the explicit memory performing analog in-memory computation on high-dimensional (HD) vectors.
arXiv Detail & Related papers (2020-10-05T12:01:56Z)
Improving Memory Utilization in Convolutional Neural Network Accelerators [16.340620299847384]
We propose a mapping method that allows activation layers to overlap and thus utilize the memory more efficiently. Experiments with various real-world object detector networks show that the proposed mapping technique can decrease the activations memory by up to 32.9%. For higher resolution de-noising networks, we achieve activation memory savings of 48.8%.
arXiv Detail & Related papers (2020-07-20T09:34:36Z)
Efficient Memory Management for Deep Neural Net Inference [0.0]
Deep neural net inference can now be moved to mobile and embedded devices, desired for various reasons ranging from latency to privacy. These devices are not only limited by their compute power and battery, but also by their inferior physical memory and cache, and thus, an efficient memory manager becomes a crucial component for deep neural net inference at the edge.
arXiv Detail & Related papers (2020-01-10T02:45:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.