Accelerating Hierarchical Associative Memory: A Deep Equilibrium
Approach
- URL: http://arxiv.org/abs/2311.15673v1
- Date: Mon, 27 Nov 2023 10:02:12 GMT
- Title: Accelerating Hierarchical Associative Memory: A Deep Equilibrium
Approach
- Authors: C\'edric Goemaere, Johannes Deleu, Thomas Demeester
- Abstract summary: We propose two strategies to speed up memory retrieval in Hierarchical Associative Memory models.
First, we show how they can be cast as Deep Equilibrium Models, which allows using faster and more stable solvers.
Second, inspired by earlier work, we show that alternating optimization of the even and odd layers accelerates memory retrieval by a factor close to two.
- Score: 12.829893293085732
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hierarchical Associative Memory models have recently been proposed as a
versatile extension of continuous Hopfield networks. In order to facilitate
future research on such models, especially at scale, we focus on increasing
their simulation efficiency on digital hardware. In particular, we propose two
strategies to speed up memory retrieval in these models, which corresponds to
their use at inference, but is equally important during training. First, we
show how they can be cast as Deep Equilibrium Models, which allows using faster
and more stable solvers. Second, inspired by earlier work, we show that
alternating optimization of the even and odd layers accelerates memory
retrieval by a factor close to two. Combined, these two techniques allow for a
much faster energy minimization, as shown in our proof-of-concept experimental
results. The code is available at https://github.com/cgoemaere/hamdeq
Related papers
- Bridging Associative Memory and Probabilistic Modeling [29.605203018237457]
Associative memory and probabilistic modeling are two fundamental topics in artificial intelligence.
We build a bridge between the two that enables useful flow of ideas in both directions.
arXiv Detail & Related papers (2024-02-15T18:56:46Z) - A-SDM: Accelerating Stable Diffusion through Redundancy Removal and
Performance Optimization [54.113083217869516]
In this work, we first explore the computational redundancy part of the network.
We then prune the redundancy blocks of the model and maintain the network performance.
Thirdly, we propose a global-regional interactive (GRI) attention to speed up the computationally intensive attention part.
arXiv Detail & Related papers (2023-12-24T15:37:47Z) - Improving Dual-Encoder Training through Dynamic Indexes for Negative
Mining [61.09807522366773]
We introduce an algorithm that approximates the softmax with provable bounds and that dynamically maintains the tree.
In our study on datasets with over twenty million targets, our approach cuts error by half in relation to oracle brute-force negative mining.
arXiv Detail & Related papers (2023-03-27T15:18:32Z) - A Theory of I/O-Efficient Sparse Neural Network Inference [17.862408781750126]
Machine learning models increase their accuracy at a fast rate, so their demand for energy and compute resources increases.
On a low level, the major part of these resources is consumed by data movement between different memory units.
We provide a rigorous theoretical analysis of the I/Os needed in sparse feedforward neural network (FFNN) inference.
arXiv Detail & Related papers (2023-01-03T11:23:46Z) - On the efficiency of Stochastic Quasi-Newton Methods for Deep Learning [0.0]
We study the behaviour of quasi-Newton training algorithms for deep memory networks.
We show that quasi-Newtons are efficient and able to outperform in some instances the well-known first-order Adam run.
arXiv Detail & Related papers (2022-05-18T20:53:58Z) - When Liebig's Barrel Meets Facial Landmark Detection: A Practical Model [87.25037167380522]
We propose a model that is accurate, robust, efficient, generalizable, and end-to-end trainable.
In order to achieve a better accuracy, we propose two lightweight modules.
DQInit dynamically initializes the queries of decoder from the inputs, enabling the model to achieve as good accuracy as the ones with multiple decoder layers.
QAMem is designed to enhance the discriminative ability of queries on low-resolution feature maps by assigning separate memory values to each query rather than a shared one.
arXiv Detail & Related papers (2021-05-27T13:51:42Z) - MetaDistiller: Network Self-Boosting via Meta-Learned Top-Down
Distillation [153.56211546576978]
In this work, we propose that better soft targets with higher compatibil-ity can be generated by using a label generator.
We can employ the meta-learning technique to optimize this label generator.
The experiments are conducted on two standard classificationbenchmarks, namely CIFAR-100 and ILSVRC2012.
arXiv Detail & Related papers (2020-08-27T13:04:27Z) - Scaling Distributed Deep Learning Workloads beyond the Memory Capacity
with KARMA [58.040931661693925]
We propose a strategy that combines redundant recomputing and out-of-core methods.
We achieve an average of 1.52x speedup in six different models over the state-of-the-art out-of-core methods.
Our data parallel out-of-core solution can outperform complex hybrid model parallelism in training large models, e.g. Megatron-LM and Turning-NLG.
arXiv Detail & Related papers (2020-08-26T07:24:34Z) - Approximated Bilinear Modules for Temporal Modeling [116.6506871576514]
Two-layers in CNNs can be converted to temporal bilinear modules by adding an auxiliary-branch sampling.
Our models can outperform most state-of-the-art methods on SomethingSomething v1 and v2 datasets without pretraining.
arXiv Detail & Related papers (2020-07-25T09:07:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.