The Birth of Knowledge: Emergent Features across Time, Space, and Scale in Large Language Models
- URL: http://arxiv.org/abs/2505.19440v1
- Date: Mon, 26 May 2025 02:59:54 GMT
- Title: The Birth of Knowledge: Emergent Features across Time, Space, and Scale in Large Language Models
- Authors: Shashata Sawmya, Micah Adler, Nir Shavit,
- Abstract summary: This paper studies the emergence of interpretable categorical features within large language models (LLMs)<n>Using sparse autoencoders for mechanistic interpretability, we identify when and where specific semantic concepts emerge within neural activations.
- Score: 3.541570601342306
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper studies the emergence of interpretable categorical features within large language models (LLMs), analyzing their behavior across training checkpoints (time), transformer layers (space), and varying model sizes (scale). Using sparse autoencoders for mechanistic interpretability, we identify when and where specific semantic concepts emerge within neural activations. Results indicate clear temporal and scale-specific thresholds for feature emergence across multiple domains. Notably, spatial analysis reveals unexpected semantic reactivation, with early-layer features re-emerging at later layers, challenging standard assumptions about representational dynamics in transformer models.
Related papers
- Causal Representation Meets Stochastic Modeling under Generic Geometry [49.24293444627916]
We develop causal representation learning for continuous-time latent point processes.<n>We develop MUTATE, an identifiable variational autoencoder framework with a time-adaptive transition module.<n>Across simulated and empirical studies, we find that MUTATE can effectively answer scientific questions.
arXiv Detail & Related papers (2026-02-04T20:40:53Z) - Dynamical Systems Analysis Reveals Functional Regimes in Large Language Models [0.8694591156258423]
Large language models perform text generation through high-dimensional internal dynamics.<n>Most interpretability approaches emphasise static representations or causal interventions, leaving temporal structure largely unexplored.<n>We discuss a composite dynamical metric, computed from activation time-series during autoregressive generation.
arXiv Detail & Related papers (2026-01-11T21:57:52Z) - The Origins of Representation Manifolds in Large Language Models [52.68554895844062]
We show that cosine similarity in representation space may encode the intrinsic geometry of a feature through shortest, on-manifold paths.<n>The critical assumptions and predictions of the theory are validated on text embeddings and token activations of large language models.
arXiv Detail & Related papers (2025-05-23T13:31:22Z) - Neural models for prediction of spatially patterned phase transitions: methods and challenges [0.37282630026096597]
Early Warning Signal (EWS) detection has shown promise in identifying dynamical signatures of oncoming critical transitions.<n>This paper explores the successes and shortcomings of neural EWS detection for spatially phase patterned transitions.
arXiv Detail & Related papers (2025-05-14T18:24:15Z) - Enforcing Interpretability in Time Series Transformers: A Concept Bottleneck Framework [2.8470354623829577]
We develop a framework based on Concept Bottleneck Models to enforce interpretability of time series Transformers.
We modify the training objective to encourage a model to develop representations similar to predefined interpretable concepts.
We find that the model performance remains mostly unaffected, while the model shows much improved interpretability.
arXiv Detail & Related papers (2024-10-08T14:22:40Z) - TempoFormer: A Transformer for Temporally-aware Representations in Change Detection [12.063146420389371]
We introduce TempoFormer, the first task-agnostic transformer-based and temporally-aware model for dynamic representation learning.
Our approach is jointly trained on inter and intra context dynamics and introduces a novel temporal variation of rotary positional embeddings.
We show new SOTA performance on three different real-time change detection tasks.
arXiv Detail & Related papers (2024-08-28T10:25:53Z) - BBScoreV2: Learning Time-Evolution and Latent Alignment from Stochastic Representation [23.765789561546715]
Autoregressive generative models play a key role in various language tasks, especially for modeling and evaluating long text sequences.<n>In this work, we observe that fitting transformer-based model embeddings into a process yields ordered latent representations from originally unordered model outputs.<n>We introduce a novel likelihood-based evaluation metric BBVScore2, offering both intuitive and quantitative support for the effectiveness of BBV2.
arXiv Detail & Related papers (2024-05-28T02:33:38Z) - Learning Semantic Textual Similarity via Topic-informed Discrete Latent
Variables [17.57873577962635]
We develop a topic-informed discrete latent variable model for semantic textual similarity.
Our model learns a shared latent space for sentence-pair representation via vector quantization.
We show that our model is able to surpass several strong neural baselines in semantic textual similarity tasks.
arXiv Detail & Related papers (2022-11-07T15:09:58Z) - Variational Predictive Routing with Nested Subjective Timescales [1.6114012813668934]
We present Variational Predictive Routing (PRV) - a neural inference system that organizes latent video features in a temporal hierarchy.
We show that VPR is able to detect event boundaries, disentangletemporal features, adapt to the dynamics hierarchy of the data, and produce accurate time-agnostic rollouts of the future.
arXiv Detail & Related papers (2021-10-21T16:12:59Z) - Did the Cat Drink the Coffee? Challenging Transformers with Generalized
Event Knowledge [59.22170796793179]
Transformers Language Models (TLMs) were tested on a benchmark for the textitdynamic estimation of thematic fit
Our results show that TLMs can reach performances that are comparable to those achieved by SDM.
However, additional analysis consistently suggests that TLMs do not capture important aspects of event knowledge.
arXiv Detail & Related papers (2021-07-22T20:52:26Z) - Preliminary study on using vector quantization latent spaces for TTS/VC
systems with consistent performance [55.10864476206503]
We investigate the use of quantized vectors to model the latent linguistic embedding.
By enforcing different policies over the latent spaces in the training, we are able to obtain a latent linguistic embedding.
Our experiments show that the voice cloning system built with vector quantization has only a small degradation in terms of perceptive evaluations.
arXiv Detail & Related papers (2021-06-25T07:51:35Z) - A comprehensive comparative evaluation and analysis of Distributional
Semantic Models [61.41800660636555]
We perform a comprehensive evaluation of type distributional vectors, either produced by static DSMs or obtained by averaging the contextualized vectors generated by BERT.
The results show that the alleged superiority of predict based models is more apparent than real, and surely not ubiquitous.
We borrow from cognitive neuroscience the methodology of Representational Similarity Analysis (RSA) to inspect the semantic spaces generated by distributional models.
arXiv Detail & Related papers (2021-05-20T15:18:06Z) - Episodic Memory for Learning Subjective-Timescale Models [1.933681537640272]
In model-based learning, an agent's model is commonly defined over transitions between consecutive states of an environment.
In contrast, intelligent behaviour in biological organisms is characterised by the ability to plan over varying temporal scales depending on the context.
We devise a novel approach to learning a transition dynamics model, based on the sequences of episodic memories that define the agent's subjective timescale.
arXiv Detail & Related papers (2020-10-03T21:55:40Z) - Multi-timescale Representation Learning in LSTM Language Models [69.98840820213937]
Language models must capture statistical dependencies between words at timescales ranging from very short to very long.
We derived a theory for how the memory gating mechanism in long short-term memory language models can capture power law decay.
Experiments showed that LSTM language models trained on natural English text learn to approximate this theoretical distribution.
arXiv Detail & Related papers (2020-09-27T02:13:38Z) - Deducing neighborhoods of classes from a fitted model [68.8204255655161]
In this article a new kind of interpretable machine learning method is presented.
It can help to understand the partitioning of the feature space into predicted classes in a classification model using quantile shifts.
Basically, real data points (or specific points of interest) are used and the changes of the prediction after slightly raising or decreasing specific features are observed.
arXiv Detail & Related papers (2020-09-11T16:35:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.