Related papers: A non-ergodic framework for understanding emergent capabilities in Large Language Models

Related papers

Beyond Language Modeling: An Exploration of Multimodal Pretraining [125.34714978184638]
We provide empirical clarity through controlled, from-scratch pretraining experiments.<n>We adopt the Transfusion framework, using next-token prediction for language and diffusion for vision.<n>We demonstrate that the MoE architecture harmonizes this scaling asymmetry by providing the high model capacity required by language.
arXiv Detail & Related papers (2026-03-03T18:58:00Z)
The Trinity of Consistency as a Defining Principle for General World Models [106.16462830681452]
General World Models are capable of learning, simulating, and reasoning about objective physical laws.<n>We propose a principled theoretical framework that defines the essential properties requisite for a General World Model.<n>Our work establishes a principled pathway toward general world models, clarifying both the limitations of current systems and the architectural requirements for future progress.
arXiv Detail & Related papers (2026-02-26T16:15:55Z)
Emergent Structured Representations Support Flexible In-Context Inference in Large Language Models [77.98801218316505]
Large language models (LLMs) exhibit emergent behaviors suggestive of human-like reasoning.<n>We investigate the internal processing of LLMs during in-context concept inference.
arXiv Detail & Related papers (2026-02-08T03:14:39Z)
The Grammar of Transformers: A Systematic Review of Interpretability Research on Syntactic Knowledge in Language Models [3.281168543761194]
We present a systematic review of 337 articles evaluating the syntactic abilities of Transformer-based language models.<n>Results suggest that TLMs capture form-oriented phenomena well, but show more variable and weaker performance on phenomena at the syntax-semantics interface.
arXiv Detail & Related papers (2026-01-09T16:34:19Z)
Global and Local Entailment Learning for Natural World Imagery [7.874291189886743]
Radial Cross-Modal Embeddings (RCME) is a framework that enables the explicit modeling of transitivity-enforced entailment.<n>We develop a hierarchical vision-language foundation model capable of representing the hierarchy in the Tree of Life.
arXiv Detail & Related papers (2025-06-26T17:05:06Z)
From Words to Waves: Analyzing Concept Formation in Speech and Text-Based Foundation Models [20.244145418997377]
We analyze the conceptual structures learned by speech and textual models both individually and jointly.<n>We employ Latent Concept Analysis, an unsupervised method for uncovering latent representations in neural networks, to examine how semantic abstractions form across modalities.
arXiv Detail & Related papers (2025-06-01T19:33:21Z)
Multi-Scale Probabilistic Generation Theory: A Unified Information-Theoretic Framework for Hierarchical Structure in Large Language Models [1.0117553823134735]
Large Language Models (LLMs) exhibit remarkable emergent abilities but remain poorly understood at a mechanistic level.<n>This paper introduces the Multi-Scale Probabilistic Generation Theory (MSPGT)<n>MSPGT posits that standard language modeling objectives implicitly optimize multi-scale information compression.
arXiv Detail & Related papers (2025-05-23T16:55:35Z)
Concept Formation and Alignment in Language Models: Bridging Statistical Patterns in Latent Space to Concept Taxonomy [11.232704182001253]
This paper explores the concept formation and alignment within the realm of language models (LMs) We propose a mechanism for identifying concepts and their hierarchical organization within the semantic representations learned by various LMs.
arXiv Detail & Related papers (2024-06-08T01:27:19Z)
Learning Discrete Concepts in Latent Hierarchical Models [73.01229236386148]
Learning concepts from natural high-dimensional data holds potential in building human-aligned and interpretable machine learning models. We formalize concepts as discrete latent causal variables that are related via a hierarchical causal model. We substantiate our theoretical claims with synthetic data experiments.
arXiv Detail & Related papers (2024-06-01T18:01:03Z)
Explaining Text Similarity in Transformer Models [52.571158418102584]
Recent advances in explainable AI have made it possible to mitigate limitations by leveraging improved explanations for Transformers. We use BiLRP, an extension developed for computing second-order explanations in bilinear similarity models, to investigate which feature interactions drive similarity in NLP models. Our findings contribute to a deeper understanding of different semantic similarity tasks and models, highlighting how novel explainable AI methods enable in-depth analyses and corpus-level insights.
arXiv Detail & Related papers (2024-05-10T17:11:31Z)
Entity Embeddings : Perspectives Towards an Omni-Modality Era for Large Language Models [0.0]
Large Language Models (LLMs) are evolving to integrate multiple modalities, such as text, image, and audio into a unified linguistic space. We envision a future direction where conceptual entities defined in sequences of text can also be imagined as modalities.
arXiv Detail & Related papers (2023-10-27T17:04:10Z)
Parrot Mind: Towards Explaining the Complex Task Reasoning of Pretrained Large Language Models with Template-Content Structure [66.33623392497599]
We show that a structure called template-content structure (T-C structure) can reduce the possible space from exponential level to linear level. We demonstrate that models can achieve task composition, further reducing the space needed to learn from linear to logarithmic.
arXiv Detail & Related papers (2023-10-09T06:57:45Z)
Explainability for Large Language Models: A Survey [59.67574757137078]
Large language models (LLMs) have demonstrated impressive capabilities in natural language processing. This paper introduces a taxonomy of explainability techniques and provides a structured overview of methods for explaining Transformer-based language models.
arXiv Detail & Related papers (2023-09-02T22:14:26Z)
SINC: Self-Supervised In-Context Learning for Vision-Language Tasks [64.44336003123102]
We propose a framework to enable in-context learning in large language models. A meta-model can learn on self-supervised prompts consisting of tailored demonstrations. Experiments show that SINC outperforms gradient-based methods in various vision-language tasks.
arXiv Detail & Related papers (2023-07-15T08:33:08Z)
From Word Models to World Models: Translating from Natural Language to the Probabilistic Language of Thought [124.40905824051079]
We propose rational meaning construction, a computational framework for language-informed thinking. We frame linguistic meaning as a context-sensitive mapping from natural language into a probabilistic language of thought. We show that LLMs can generate context-sensitive translations that capture pragmatically-appropriate linguistic meanings. We extend our framework to integrate cognitively-motivated symbolic modules.
arXiv Detail & Related papers (2023-06-22T05:14:00Z)
Language Models are Bounded Pragmatic Speakers: Understanding RLHF from a Bayesian Cognitive Modeling Perspective [2.8282906214258805]
This paper formulates a probabilistic cognitive model called the bounded pragmatic speaker. We demonstrate that large language models fine-tuned with reinforcement learning from human feedback embody a model of thought that resembles a fast-and-slow model.
arXiv Detail & Related papers (2023-05-28T16:04:48Z)
Multi-Relational Hyperbolic Word Embeddings from Natural Language Definitions [5.763375492057694]
This paper presents a multi-relational model that explicitly leverages such a structure to derive word embeddings from definitions. An empirical analysis demonstrates that the framework can help imposing the desired structural constraints. Experiments reveal the superiority of the Hyperbolic word embeddings over the Euclidean counterparts.
arXiv Detail & Related papers (2023-05-12T08:16:06Z)
A Theory of Emergent In-Context Learning as Implicit Structure Induction [8.17811111226145]
Scaling large language models leads to an emergent capacity to learn in-context from example demonstrations. We argue that in-context learning relies on recombination of compositional operations found in natural language data. We show how in-context learning is supported by a representation of the input's compositional structure.
arXiv Detail & Related papers (2023-03-14T15:24:05Z)
Model Criticism for Long-Form Text Generation [113.13900836015122]
We apply a statistical tool, model criticism in latent space, to evaluate the high-level structure of generated text. We perform experiments on three representative aspects of high-level discourse -- coherence, coreference, and topicality. We find that transformer-based language models are able to capture topical structures but have a harder time maintaining structural coherence or modeling coreference.
arXiv Detail & Related papers (2022-10-16T04:35:58Z)
Modelling Compositionality and Structure Dependence in Natural Language [0.12183405753834563]
Drawing on linguistics and set theory, a formalisation of these ideas is presented in the first half of this thesis. We see how cognitive systems that process language need to have certain functional constraints. Using the advances of word embedding techniques, a model of relational learning is simulated.
arXiv Detail & Related papers (2020-11-22T17:28:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.