Single-Nodal Spontaneous Symmetry Breaking in NLP Models
- URL: http://arxiv.org/abs/2601.20582v1
- Date: Wed, 28 Jan 2026 13:20:02 GMT
- Title: Single-Nodal Spontaneous Symmetry Breaking in NLP Models
- Authors: Shalom Rosner, Ronit D. Gross, Ella Koresh, Ido Kanter,
- Abstract summary: We demonstrate the emergence of spontaneous symmetry breaking in natural language processing (NLP) models.<n>This phenomenon occurs at the level of individual attention heads and is scaled-down to its small subset of nodes.<n>Results are demonstrated using BERT-6 architecture pre-trained on Wikipedia dataset and fine-tuned on the FewRel classification task.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Spontaneous symmetry breaking in statistical mechanics primarily occurs during phase transitions at the thermodynamic limit where the Hamiltonian preserves inversion symmetry, yet the low-temperature free energy exhibits reduced symmetry. Herein, we demonstrate the emergence of spontaneous symmetry breaking in natural language processing (NLP) models during both pre-training and fine-tuning, even under deterministic dynamics and within a finite training architecture. This phenomenon occurs at the level of individual attention heads and is scaled-down to its small subset of nodes and also valid at a single-nodal level, where nodes acquire the capacity to learn a limited set of tokens after pre-training or labels after fine-tuning for a specific classification task. As the number of nodes increases, a crossover in learning ability occurs, governed by the tradeoff between a decrease following random-guess among increased possible outputs, and enhancement following nodal cooperation, which exceeds the sum of individual nodal capabilities. In contrast to spin-glass systems, where a microscopic state of frozen spins cannot be directly linked to the free-energy minimization goal, each nodal function in this framework contributes explicitly to the global network task and can be upper-bounded using convex hull analysis. Results are demonstrated using BERT-6 architecture pre-trained on Wikipedia dataset and fine-tuned on the FewRel classification task.
Related papers
- On the Mechanism and Dynamics of Modular Addition: Fourier Features, Lottery Ticket, and Grokking [49.1352577985191]
We present a comprehensive analysis of how two-layer neural networks learn features to solve the modular addition task.<n>Our work provides a full mechanistic interpretation of the learned model and a theoretical explanation of its training dynamics.
arXiv Detail & Related papers (2026-02-18T20:25:13Z) - UltraLIF: Fully Differentiable Spiking Neural Networks via Ultradiscretization and Max-Plus Algebra [0.0]
Spiking Neural Networks (SNNs) offer energy-efficient, biologically plausible computation but suffer from non-differentiable spike generation.<n>This paper introduces UltraLIF, a principled framework that replaces surrogate gradients with ultradiscretization.<n>Experiments on six benchmarks spanning static images, neuromorphic vision, and audio demonstrate improvements over surrogate gradient baselines.
arXiv Detail & Related papers (2026-02-10T18:21:54Z) - Random-Matrix-Induced Simplicity Bias in Over-parameterized Variational Quantum Circuits [72.0643009153473]
We show that expressive variational ansatze enter a Haar-like universality class in which both observable expectation values and parameter gradients concentrate exponentially with system size.<n>As a consequence, the hypothesis class induced by such circuits collapses with high probability to a narrow family of near-constant functions.<n>We further show that this collapse is not unavoidable: tensor-structured VQCs, including tensor-network-based and tensor-hypernetwork parameterizations, lie outside the Haar-like universality class.
arXiv Detail & Related papers (2026-01-05T08:04:33Z) - Defect Bootstrap: Tight Ground State Bounds in Spontaneous Symmetry Breaking Phases [0.0]
bootstrap methods have enabled rigorous two-sided bounds on local observables directly in the thermodynamic limit.<n>These bounds inevitably become loose in symmetry broken phases, where local constraints are insufficient to capture long-range order.<n>We introduce a $textitdefect bootstrap$ framework that resolves this limitation by embedding the system into an auxiliary $textitdefect model.<n>Our results demonstrate that physically motivated constraint sets can dramatically enhance the power of bootstrap methods for quantum many-body systems.
arXiv Detail & Related papers (2025-11-25T21:17:54Z) - PointNSP: Autoregressive 3D Point Cloud Generation with Next-Scale Level-of-Detail Prediction [87.33016661440202]
Autoregressive point cloud generation has long lagged behind diffusion-based approaches in quality.<n>We propose PointNSP, a coarse-to-fine generative framework that preserves global shape structure at low resolutions.<n> Experiments on ShapeNet show that PointNSP establishes state-of-the-art (SOTA) generation quality for the first time within the autoregressive paradigm.
arXiv Detail & Related papers (2025-10-07T06:31:02Z) - Ordinal Label-Distribution Learning with Constrained Asymmetric Priors for Imbalanced Retinal Grading [9.147336466586017]
Diabetic retinopathy grading is inherently ordinal and long-tailed.<n>We propose the Constrained Asymmetric Prior Wasserstein Autoencoder (CAP-WAE)<n>CAP-WAE consistently achieves state-of-the-art Quadratic Weighted Kappa, accuracy, and macro-F1.
arXiv Detail & Related papers (2025-09-30T11:58:49Z) - Neural Collapse in Cumulative Link Models for Ordinal Regression: An Analysis with Unconstrained Feature Model [5.339955242953934]
We show that a phenomenon we call Ordinal Neural Collapse (ONC) indeed emerges and is characterized by the following three properties.<n>In particular, in the zero-regularization limit, a highly local and simple geometric relationship emerges between the latent variables and the threshold values.
arXiv Detail & Related papers (2025-06-06T06:57:02Z) - Machine learning in and out of equilibrium [58.88325379746631]
Our study uses a Fokker-Planck approach, adapted from statistical physics, to explore these parallels.
We focus in particular on the stationary state of the system in the long-time limit, which in conventional SGD is out of equilibrium.
We propose a new variation of Langevin dynamics (SGLD) that harnesses without replacement minibatching.
arXiv Detail & Related papers (2023-06-06T09:12:49Z) - Message-Passing Neural Quantum States for the Homogeneous Electron Gas [41.94295877935867]
We introduce a message-passing-neural-network-based wave function Ansatz to simulate extended, strongly interacting fermions in continuous space.
We demonstrate its accuracy by simulating the ground state of the homogeneous electron gas in three spatial dimensions.
arXiv Detail & Related papers (2023-05-12T04:12:04Z) - Dynamical singularity of the rate function for quench dynamics in
finite-size quantum systems [1.2514666672776884]
We study the realization of the dynamical singularity of the rate function for finite-size systems under the twist boundary condition.
We show that exact zeros of the Loschmidt echo can be always achieved when the postquench parameter is across the underlying equilibrium phase transition point.
arXiv Detail & Related papers (2022-11-06T14:35:57Z) - Beyond the Edge of Stability via Two-step Gradient Updates [49.03389279816152]
Gradient Descent (GD) is a powerful workhorse of modern machine learning.
GD's ability to find local minimisers is only guaranteed for losses with Lipschitz gradients.
This work focuses on simple, yet representative, learning problems via analysis of two-step gradient updates.
arXiv Detail & Related papers (2022-06-08T21:32:50Z) - Gradient Starvation: A Learning Proclivity in Neural Networks [97.02382916372594]
Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task.
This work provides a theoretical explanation for the emergence of such feature imbalance in neural networks.
arXiv Detail & Related papers (2020-11-18T18:52:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.