Emergent Specialization: Rare Token Neurons in Language Models
- URL: http://arxiv.org/abs/2505.12822v2
- Date: Thu, 22 May 2025 16:03:57 GMT
- Title: Emergent Specialization: Rare Token Neurons in Language Models
- Authors: Jing Liu, Haozheng Wang, Yueheng Li,
- Abstract summary: Large language models struggle with representing and generating rare tokens despite their importance in specialized domains.<n>In this study, we identify neuron structures with exceptionally strong influence on language model's prediction of rare tokens, termed as rare token neurons.
- Score: 5.946977198458224
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models struggle with representing and generating rare tokens despite their importance in specialized domains. In this study, we identify neuron structures with exceptionally strong influence on language model's prediction of rare tokens, termed as rare token neurons, and investigate the mechanism for their emergence and behavior. These neurons exhibit a characteristic three-phase organization (plateau, power-law, and rapid decay) that emerges dynamically during training, evolving from a homogeneous initial state to a functionally differentiated architecture. In the activation space, rare token neurons form a coordinated subnetwork that selectively co-activates while avoiding co-activation with other neurons. This functional specialization potentially correlates with the development of heavy-tailed weight distributions, suggesting a statistical mechanical basis for emergent specialization.
Related papers
- State Space Models Naturally Produce Traveling Waves, Time Cells, and Scale to Abstract Cognitive Functions [7.097247619177705]
We propose a framework based on State-Space Models (SSMs), an emerging class of deep learning architectures.<n>We demonstrate that the model spontaneously develops neural representations that strikingly mimic biological 'time cells'<n>Our findings position SSMs as a compelling framework that connects single-neuron dynamics to cognitive phenomena.
arXiv Detail & Related papers (2025-07-18T03:53:16Z) - Langevin Flows for Modeling Neural Latent Dynamics [81.81271685018284]
We introduce LangevinFlow, a sequential Variational Auto-Encoder where the time evolution of latent variables is governed by the underdamped Langevin equation.<n>Our approach incorporates physical priors -- such as inertia, damping, a learned potential function, and forces -- to represent both autonomous and non-autonomous processes in neural systems.<n>Our method outperforms state-of-the-art baselines on synthetic neural populations generated by a Lorenz attractor.
arXiv Detail & Related papers (2025-07-15T17:57:48Z) - NOBLE -- Neural Operator with Biologically-informed Latent Embeddings to Capture Experimental Variability in Biological Neuron Models [68.89389652724378]
NOBLE is a neural operator framework that learns a mapping from a continuous frequency-modulated embedding of interpretable neuron features to the somatic voltage response induced by current injection.<n>It predicts distributions of neural dynamics accounting for the intrinsic experimental variability.<n>NOBLE is the first scaled-up deep learning framework validated on real experimental data.
arXiv Detail & Related papers (2025-06-05T01:01:18Z) - Neuronal and structural differentiation in the emergence of abstract rules in hierarchically modulated spiking neural networks [20.58066918526133]
Internal organizing mechanisms underlying rule abstraction remain elusive.<n>This work introduces a hierarchically modulated recurrent spiking neural network (HM-RSNN) that can tune intrinsic neuronal properties.<n>We conduct modeling using HM-RSNN across four cognitive tasks and rule abstraction contingent differentiation is observed at both network and neuron levels.
arXiv Detail & Related papers (2025-01-24T14:45:03Z) - Artificial Kuramoto Oscillatory Neurons [65.16453738828672]
It has long been known in both neuroscience and AI that ''binding'' between neurons leads to a form of competitive learning where representations are compressed in order to represent more abstract concepts in deeper layers of the network.<n>We introduce Artificial rethinking together with arbitrary connectivity designs such as fully connected convolutional, or attentive mechanisms.<n>We show that this idea provides performance improvements across a wide spectrum of tasks such as unsupervised object discovery, adversarial robustness, uncertainty, quantification, and reasoning.
arXiv Detail & Related papers (2024-10-17T17:47:54Z) - Graph-Based Representation Learning of Neuronal Dynamics and Behavior [2.3859858429583665]
We introduce the Temporal Attention-enhanced Variational Graph Recurrent Neural Network (TAVRNN), a novel framework that models time-varying neuronal connectivity.<n>TAVRNN learns latent dynamics at the single-unit level while maintaining interpretable population-level representations.<n>We validate TAVRNN on three diverse datasets: (1) electrophysiological data from a freely behaving rat, (2) primate somatosensory cortex recordings during a reaching task, and (3) biological neurons in the DishBrain platform interacting with a virtual game environment.
arXiv Detail & Related papers (2024-10-01T13:19:51Z) - Confidence Regulation Neurons in Language Models [91.90337752432075]
This study investigates the mechanisms by which large language models represent and regulate uncertainty in next-token predictions.
Entropy neurons are characterized by an unusually high weight norm and influence the final layer normalization (LayerNorm) scale to effectively scale down the logits.
token frequency neurons, which we describe here for the first time, boost or suppress each token's logit proportionally to its log frequency, thereby shifting the output distribution towards or away from the unigram distribution.
arXiv Detail & Related papers (2024-06-24T01:31:03Z) - Two-compartment neuronal spiking model expressing brain-state specific apical-amplification, -isolation and -drive regimes [0.7255608805275865]
Brain-state-specific neural mechanisms play a crucial role in integrating past and contextual knowledge with the current, incoming flow of evidence.
This work aims to provide a two-compartment spiking neuron model that incorporates features essential for supporting brain-state-specific learning.
arXiv Detail & Related papers (2023-11-10T14:16:46Z) - Astrocytes as a mechanism for meta-plasticity and contextually-guided
network function [2.66269503676104]
Astrocytes are a ubiquitous and enigmatic type of non-neuronal cell.
Astrocytes may play a more direct and active role in brain function and neural computation.
arXiv Detail & Related papers (2023-11-06T20:31:01Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z) - Continuous Learning and Adaptation with Membrane Potential and
Activation Threshold Homeostasis [91.3755431537592]
This paper presents the Membrane Potential and Activation Threshold Homeostasis (MPATH) neuron model.
The model allows neurons to maintain a form of dynamic equilibrium by automatically regulating their activity when presented with input.
Experiments demonstrate the model's ability to adapt to and continually learn from its input.
arXiv Detail & Related papers (2021-04-22T04:01:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.