Uniform Memory Retrieval with Larger Capacity for Modern Hopfield Models
- URL: http://arxiv.org/abs/2404.03827v3
- Date: Sun, 10 Nov 2024 19:25:40 GMT
- Title: Uniform Memory Retrieval with Larger Capacity for Modern Hopfield Models
- Authors: Dennis Wu, Jerry Yao-Chieh Hu, Teng-Yun Hsiao, Han Liu,
- Abstract summary: We propose a two-stage memory retrieval dynamics for modern Hopfield models.
Key contribution is a learnable feature map $Phi$ which transforms the Hopfield energy function into kernel space.
It utilizes the stored memory patterns as learning data to enhance memory capacity across all modern Hopfield models.
- Score: 5.929540708452128
- License:
- Abstract: We propose a two-stage memory retrieval dynamics for modern Hopfield models, termed $\mathtt{U\text{-}Hop}$, with enhanced memory capacity. Our key contribution is a learnable feature map $\Phi$ which transforms the Hopfield energy function into kernel space. This transformation ensures convergence between the local minima of energy and the fixed points of retrieval dynamics within the kernel space. Consequently, the kernel norm induced by $\Phi$ serves as a novel similarity measure. It utilizes the stored memory patterns as learning data to enhance memory capacity across all modern Hopfield models. Specifically, we accomplish this by constructing a separation loss $\mathcal{L}_\Phi$ that separates the local minima of kernelized energy by separating stored memory patterns in kernel space. Methodologically, $\mathtt{U\text{-}Hop}$ memory retrieval process consists of: (Stage I) minimizing separation loss for a more uniform memory (local minimum) distribution, followed by (Stage II) standard Hopfield energy minimization for memory retrieval. This results in a significant reduction of possible metastable states in the Hopfield energy function, thus enhancing memory capacity by preventing memory confusion. Empirically, with real-world datasets, we demonstrate that $\mathtt{U\text{-}Hop}$ outperforms all existing modern Hopfield models and state-of-the-art similarity measures, achieving substantial improvements in both associative memory retrieval and deep learning tasks. Code is available at https://github.com/MAGICS-LAB/UHop ; future updates are on arXiv:2404.03827
Related papers
- Hopfield-Fenchel-Young Networks: A Unified Framework for Associative Memory Retrieval [25.841394444834933]
Associative memory models, such as Hopfield networks, have garnered renewed interest due to advancements in memory capacity and connections with self-attention in transformers.
In this work, we introduce a unified framework-Hopfield-Fenchel-Young networks-which generalizes these models to a broader family of energy functions.
arXiv Detail & Related papers (2024-11-13T13:13:07Z) - Provably Optimal Memory Capacity for Modern Hopfield Models: Transformer-Compatible Dense Associative Memories as Spherical Codes [6.477597248683852]
We study the optimal capacity memorization of modern Hopfield models and Kernelized Hopfield Models (KHMs)
We show that the optimal capacity of KHMs occurs when the feature space allows memories to form an optimal spherical code.
arXiv Detail & Related papers (2024-10-30T15:35:51Z) - B'MOJO: Hybrid State Space Realizations of Foundation Models with Eidetic and Fading Memory [91.81390121042192]
We develop a class of models called B'MOJO to seamlessly combine eidetic and fading memory within an composable module.
B'MOJO's ability to modulate eidetic and fading memory results in better inference on longer sequences tested up to 32K tokens.
arXiv Detail & Related papers (2024-07-08T18:41:01Z) - Outlier-Efficient Hopfield Layers for Large Transformer-Based Models [10.972020273638066]
We introduce an Outlier-Efficient Modern Hopfield Model (termed $mathrmOutEffHop$)
Our main contribution is a novel associative memory model facilitating textitoutlier-efficient associative memory retrievals.
$mathrmOutEffHop$ achieves an average reduction of 22+% in average kurtosis and 26+% in the maximum infinity norm of model outputs.
arXiv Detail & Related papers (2024-04-04T23:08:43Z) - Topology-aware Embedding Memory for Continual Learning on Expanding Networks [63.35819388164267]
We present a framework to tackle the memory explosion problem using memory replay techniques.
PDGNNs with Topology-aware Embedding Memory (TEM) significantly outperform state-of-the-art techniques.
arXiv Detail & Related papers (2024-01-24T03:03:17Z) - STanHop: Sparse Tandem Hopfield Model for Memory-Enhanced Time Series
Prediction [13.815793371488613]
We present a novel Hopfield-based neural network block, which sparsely learns and stores both temporal and cross-series representations.
In essence, STanHop sequentially learn temporal representation and cross-series representation using two tandem sparse Hopfield layers.
We show that our framework endows a tighter memory retrieval error compared to the dense counterpart without sacrificing memory capacity.
arXiv Detail & Related papers (2023-12-28T20:26:23Z) - Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model [89.8764435351222]
We propose a new family of unbiased estimators called WTA-CRS, for matrix production with reduced variance.
Our work provides both theoretical and experimental evidence that, in the context of tuning transformers, our proposed estimators exhibit lower variance compared to existing ones.
arXiv Detail & Related papers (2023-05-24T15:52:08Z) - Universal Hopfield Networks: A General Framework for Single-Shot
Associative Memory Models [41.58529335439799]
We propose a general framework for understanding the operation of memory networks as a sequence of three operations.
We derive all these memory models as instances of our general framework with differing similarity and separation functions.
arXiv Detail & Related papers (2022-02-09T16:48:06Z) - MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning [72.80896338009579]
We find that the memory bottleneck is due to the imbalanced memory distribution in convolutional neural network (CNN) designs.
We propose a generic patch-by-patch inference scheduling, which significantly cuts down the peak memory.
We automate the process with neural architecture search to jointly optimize the neural architecture and inference scheduling, leading to MCUNetV2.
arXiv Detail & Related papers (2021-10-28T17:58:45Z) - Kanerva++: extending The Kanerva Machine with differentiable, locally
block allocated latent memory [75.65949969000596]
Episodic and semantic memory are critical components of the human memory model.
We develop a new principled Bayesian memory allocation scheme that bridges the gap between episodic and semantic memory.
We demonstrate that this allocation scheme improves performance in memory conditional image generation.
arXiv Detail & Related papers (2021-02-20T18:40:40Z) - Memformer: A Memory-Augmented Transformer for Sequence Modeling [55.780849185884996]
We present Memformer, an efficient neural network for sequence modeling.
Our model achieves linear time complexity and constant memory space complexity when processing long sequences.
arXiv Detail & Related papers (2020-10-14T09:03:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.