Hopfield Networks is All You Need
- URL: http://arxiv.org/abs/2008.02217v3
- Date: Wed, 28 Apr 2021 07:24:49 GMT
- Title: Hopfield Networks is All You Need
- Authors: Hubert Ramsauer, Bernhard Sch\"afl, Johannes Lehner, Philipp Seidl,
Michael Widrich, Thomas Adler, Lukas Gruber, Markus Holzleitner, Milena
Pavlovi\'c, Geir Kjetil Sandve, Victor Greiff, David Kreil, Michael Kopp,
G\"unter Klambauer, Johannes Brandstetter, Sepp Hochreiter
- Abstract summary: We introduce a modern Hopfield network with continuous states and a corresponding update rule.
The new Hopfield network can store exponentially (with the dimension of the associative space) many patterns, retrieves the pattern with one update, and has exponentially small retrieval errors.
We demonstrate the broad applicability of the Hopfield layers across various domains.
- Score: 8.508381229662907
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a modern Hopfield network with continuous states and a
corresponding update rule. The new Hopfield network can store exponentially
(with the dimension of the associative space) many patterns, retrieves the
pattern with one update, and has exponentially small retrieval errors. It has
three types of energy minima (fixed points of the update): (1) global fixed
point averaging over all patterns, (2) metastable states averaging over a
subset of patterns, and (3) fixed points which store a single pattern. The new
update rule is equivalent to the attention mechanism used in transformers. This
equivalence enables a characterization of the heads of transformer models.
These heads perform in the first layers preferably global averaging and in
higher layers partial averaging via metastable states. The new modern Hopfield
network can be integrated into deep learning architectures as layers to allow
the storage of and access to raw input data, intermediate results, or learned
prototypes. These Hopfield layers enable new ways of deep learning, beyond
fully-connected, convolutional, or recurrent networks, and provide pooling,
memory, association, and attention mechanisms. We demonstrate the broad
applicability of the Hopfield layers across various domains. Hopfield layers
improved state-of-the-art on three out of four considered multiple instance
learning problems as well as on immune repertoire classification with several
hundreds of thousands of instances. On the UCI benchmark collections of small
classification tasks, where deep learning methods typically struggle, Hopfield
layers yielded a new state-of-the-art when compared to different machine
learning methods. Finally, Hopfield layers achieved state-of-the-art on two
drug design datasets. The implementation is available at:
https://github.com/ml-jku/hopfield-layers
Related papers
- Hopfield-Fenchel-Young Networks: A Unified Framework for Associative Memory Retrieval [25.841394444834933]
Associative memory models, such as Hopfield networks, have garnered renewed interest due to advancements in memory capacity and connections with self-attention in transformers.
In this work, we introduce a unified framework-Hopfield-Fenchel-Young networks-which generalizes these models to a broader family of energy functions.
arXiv Detail & Related papers (2024-11-13T13:13:07Z) - Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers [59.0181939916084]
Traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries.
We propose a novel Priors Distillation (RPD) method to extract priors from the well-trained transformers on massive images.
Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification.
arXiv Detail & Related papers (2024-07-26T06:29:09Z) - SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding [56.079013202051094]
We present SegVG, a novel method transfers the box-level annotation as signals to provide an additional pixel-level supervision for Visual Grounding.
This approach allows us to iteratively exploit the annotation as signals for both box-level regression and pixel-level segmentation.
arXiv Detail & Related papers (2024-07-03T15:30:45Z) - BiSHop: Bi-Directional Cellular Learning for Tabular Data with Generalized Sparse Modern Hopfield Model [6.888608574535993]
BiSHop handles the two major challenges of deep tabular learning: non-rotationally invariant data structure and feature sparsity in data.
BiSHop uses a dual-component approach, sequentially processing data both column-wise and row-wise.
We show BiSHop surpasses current SOTA methods with significantly less HPO runs.
arXiv Detail & Related papers (2024-04-04T23:13:32Z) - Uniform Memory Retrieval with Larger Capacity for Modern Hopfield Models [5.929540708452128]
We propose a two-stage memory retrieval dynamics for modern Hopfield models.
Key contribution is a learnable feature map $Phi$ which transforms the Hopfield energy function into kernel space.
It utilizes the stored memory patterns as learning data to enhance memory capacity across all modern Hopfield models.
arXiv Detail & Related papers (2024-04-04T23:05:30Z) - STanHop: Sparse Tandem Hopfield Model for Memory-Enhanced Time Series
Prediction [13.815793371488613]
We present a novel Hopfield-based neural network block, which sparsely learns and stores both temporal and cross-series representations.
In essence, STanHop sequentially learn temporal representation and cross-series representation using two tandem sparse Hopfield layers.
We show that our framework endows a tighter memory retrieval error compared to the dense counterpart without sacrificing memory capacity.
arXiv Detail & Related papers (2023-12-28T20:26:23Z) - Improved Convergence Guarantees for Shallow Neural Networks [91.3755431537592]
We prove convergence of depth 2 neural networks, trained via gradient descent, to a global minimum.
Our model has the following features: regression with quadratic loss function, fully connected feedforward architecture, RelU activations, Gaussian data instances, adversarial labels.
They strongly suggest that, at least in our model, the convergence phenomenon extends well beyond the NTK regime''
arXiv Detail & Related papers (2022-12-05T14:47:52Z) - Hierarchical Variational Memory for Few-shot Learning Across Domains [120.87679627651153]
We introduce a hierarchical prototype model, where each level of the prototype fetches corresponding information from the hierarchical memory.
The model is endowed with the ability to flexibly rely on features at different semantic levels if the domain shift circumstances so demand.
We conduct thorough ablation studies to demonstrate the effectiveness of each component in our model.
arXiv Detail & Related papers (2021-12-15T15:01:29Z) - Modern Hopfield Networks and Attention for Immune Repertoire
Classification [8.488102471604908]
We show that the attention mechanism of transformer architectures is actually the update rule of modern Hopfield networks.
We exploit this high storage capacity to solve a challenging multiple instance learning (MIL) problem in computational biology.
We present our novel method DeepRC that integrates transformer-like attention, or equivalently modern Hopfield networks, into deep learning architectures.
arXiv Detail & Related papers (2020-07-16T20:35:46Z) - Fine-Grained Visual Classification with Efficient End-to-end
Localization [49.9887676289364]
We present an efficient localization module that can be fused with a classification network in an end-to-end setup.
We evaluate the new model on the three benchmark datasets CUB200-2011, Stanford Cars and FGVC-Aircraft.
arXiv Detail & Related papers (2020-05-11T14:07:06Z) - PointHop++: A Lightweight Learning Model on Point Sets for 3D
Classification [55.887502438160304]
The PointHop method was recently proposed by Zhang et al. for 3D point cloud classification with unsupervised feature extraction.
We improve the PointHop method furthermore in two aspects: 1) reducing its model complexity in terms of the model parameter number and 2) ordering discriminant features automatically based on the cross-entropy criterion.
With experiments conducted on the ModelNet40 benchmark dataset, we show that the PointHop++ method performs on par with deep neural network (DNN) solutions and surpasses other unsupervised feature extraction methods.
arXiv Detail & Related papers (2020-02-09T04:49:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.