Related papers: Quantitative Attractor Analysis of High-Capacity Kernel Logistic Regression Hopfield Networks

Quantitative Attractor Analysis of High-Capacity Kernel Logistic Regression Hopfield Networks

URL: http://arxiv.org/abs/2505.01218v1
Date: Fri, 02 May 2025 12:13:23 GMT
Title: Quantitative Attractor Analysis of High-Capacity Kernel Logistic Regression Hopfield Networks
Authors: Akira Tamamori,
Abstract summary: This paper quantitatively analyzes the attractor structures in KLR-trained networks via extensive simulations.<n>We evaluate recall from diverse initial states across wide storage loads (up to 4.0 P/N) and noise levels.<n>Our analysis confirms KLR's superior performance: high capacity (up to 4.0 P/N) and robustness.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Traditional Hopfield networks, using Hebbian learning, face severe storage capacity limits ($\approx 0.14$ P/N) and spurious attractors. Kernel Logistic Regression (KLR) offers a non-linear approach, mapping patterns to high-dimensional feature spaces for improved separability. Our previous work showed KLR dramatically improves capacity and noise robustness over conventional methods. This paper quantitatively analyzes the attractor structures in KLR-trained networks via extensive simulations. We evaluated recall from diverse initial states across wide storage loads (up to 4.0 P/N) and noise levels. We quantified convergence rates and speed. Our analysis confirms KLR's superior performance: high capacity (up to 4.0 P/N) and robustness. The attractor landscape is remarkably "clean," with near-zero spurious fixed points. Recall failures under high load/noise are primarily due to convergence to other learned patterns, not spurious ones. Dynamics are exceptionally fast (typically 1-2 steps for high-similarity states). This characterization reveals how KLR reshapes dynamics for high-capacity associative memory, highlighting its effectiveness and contributing to AM understanding.

Related papers

Shuffle-R1: Efficient RL framework for Multimodal Large Language Models via Data-centric Dynamic Shuffle [53.239242017802056]
Reinforcement learning (RL) has emerged as an effective post-training paradigm for enhancing the reasoning capabilities of multimodal large language model (MLLM)<n>However, current RL pipelines often suffer from training inefficiencies caused by two underexplored issues: Advantage Collapsing and Rollout Silencing.<n>We propose Shuffle-R1, a simple yet principled framework that improves RL fine-tuning efficiency by dynamically restructuring trajectory sampling and batch composition.
arXiv Detail & Related papers (2025-08-07T17:53:47Z)
Contraction, Criticality, and Capacity: A Dynamical-Systems Perspective on Echo-State Networks [13.857230672081489]
We present a unified, dynamical-systems treatment that weaves together functional analysis, random attractor theory and recent neuroscientific findings.<n>First, we prove that the Echo-State Property (wash-out of initial conditions) together with global Lipschitz dynamics necessarily yields the Fading-Memory Property.<n>Second, employing a Stone-Weierstrass strategy we give a streamlined proof that ESNs with nonlinear reservoirs and linear read-outs are dense in the Banach space of causal, time-in fading-memory filters.<n>Third, we quantify computational resources via memory-capacity spectrum, show how
arXiv Detail & Related papers (2025-07-24T14:41:18Z)
KV-Latent: Dimensional-level KV Cache Reduction with Frequency-aware Rotary Positional Embedding [72.12756830560217]
Large language models (LLMs) based on Transformer Decoders have become the preferred choice for conversational generative AI.<n>Despite the overall superiority of the Decoder architecture, the gradually increasing Key-Value cache during inference has emerged as a primary efficiency bottleneck.<n>By down-sampling the Key-Value vector dimensions into a latent space, we can significantly reduce the KV Cache footprint and improve inference speed.
arXiv Detail & Related papers (2025-07-15T12:52:12Z)
Kernel Ridge Regression for Efficient Learning of High-Capacity Hopfield Networks [0.0]
We propose Kernel Ridge Regression (KRR) as an efficient kernel-based alternative for learning high-capacity Hopfield networks.<n>KRR utilizes the kernel trick and predicts bipolar states via regression, crucially offering a non-iterative, closed-form solution for learning dual variables.<n>Our results demonstrate that KRR achieves state-of-the-art storage capacity (reaching $beta$=1.5) and noise robustness, comparable to KLR.
arXiv Detail & Related papers (2025-04-17T01:17:28Z)
Kernel Logistic Regression Learning for High-Capacity Hopfield Networks [0.0]
Hebbian learning limits Hopfield network storage capacity (pattern-to-neuron ratio around 0.14)<n>We propose Kernel Logistic Regression (KLR) learning. Unlike linear methods, KLR uses kernels to implicitly map patterns to high-dimensional feature space, enhancing separability.
arXiv Detail & Related papers (2025-04-10T10:27:43Z)
Transformers Learn Nonlinear Features In Context: Nonconvex Mean-field Dynamics on the Attention Landscape [40.78854925996]
Large language models based on the Transformer architecture have demonstrated impressive ability to learn in context. We show that a common nonlinear representation or feature map can be used to enhance power of in-context learning.
arXiv Detail & Related papers (2024-02-02T09:29:40Z)
Are GATs Out of Balance? [73.2500577189791]
We study the Graph Attention Network (GAT) in which a node's neighborhood aggregation is weighted by parameterized attention coefficients. Our main theorem serves as a stepping stone to studying the learning dynamics of positive homogeneous models with attention mechanisms.
arXiv Detail & Related papers (2023-10-11T06:53:05Z)
Excitatory/Inhibitory Balance Emerges as a Key Factor for RBN Performance, Overriding Attractor Dynamics [35.70635792124142]
Reservoir computing provides a time and cost-efficient alternative to traditional learning methods. We show that specific distribution parameters can lead to diverse dynamics near critical points. We then evaluate performance in two challenging tasks, memorization and prediction, and find that a positive excitatory balance produces a critical point with higher memory performance.
arXiv Detail & Related papers (2023-08-02T17:41:58Z)
AutoRL Hyperparameter Landscapes [69.15927869840918]
Reinforcement Learning (RL) has shown to be capable of producing impressive results, but its use is limited by the impact of its hyperparameters on performance. We propose an approach to build and analyze these hyperparameter landscapes not just for one point in time but at multiple points in time throughout training. This supports the theory that hyperparameters should be dynamically adjusted during training and shows the potential for more insights on AutoRL problems that can be gained through landscape analyses.
arXiv Detail & Related papers (2023-04-05T12:14:41Z)
Characterizing the loss landscape of variational quantum circuits [77.34726150561087]
We introduce a way to compute the Hessian of the loss function of VQCs. We show how this information can be interpreted and compared to classical neural networks.
arXiv Detail & Related papers (2020-08-06T17:48:12Z)
Towards Understanding Label Smoothing [36.54164997035046]
Label smoothing regularization (LSR) has a great success in deep neural networks by training algorithms. We show that an appropriate LSR can help to speed up convergence by reducing the variance. We propose a simple yet effective strategy, namely Two-Stage LAbel smoothing algorithm (TSLA)
arXiv Detail & Related papers (2020-06-20T20:36:17Z)
Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix. Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z)
Kernel and Rich Regimes in Overparametrized Models [69.40899443842443]
We show that gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms. We also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
arXiv Detail & Related papers (2020-02-20T15:43:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.