Related papers: Why are LLMs' abilities emergent?

Why are LLMs' abilities emergent?

URL: http://arxiv.org/abs/2508.04401v1
Date: Wed, 06 Aug 2025 12:43:04 GMT
Title: Why are LLMs' abilities emergent?
Authors: Vladimír Havlík,
Abstract summary: I argue that systems exhibit genuine emergent properties analogous to those found in other complex natural phenomena.<n>This perspective shifts the focus to understanding internal dynamic transformations that enable these systems to acquire capabilities that transcend their individual definitions.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The remarkable success of Large Language Models (LLMs) in generative tasks has raised fundamental questions about the nature of their acquired capabilities, which often appear to emerge unexpectedly without explicit training. This paper examines the emergent properties of Deep Neural Networks (DNNs) through both theoretical analysis and empirical observation, addressing the epistemological challenge of "creation without understanding" that characterises contemporary AI development. We explore how the neural approach's reliance on nonlinear, stochastic processes fundamentally differs from symbolic computational paradigms, creating systems whose macro-level behaviours cannot be analytically derived from micro-level neuron activities. Through analysis of scaling laws, grokking phenomena, and phase transitions in model capabilities, I demonstrate that emergent abilities arise from the complex dynamics of highly sensitive nonlinear systems rather than simply from parameter scaling alone. My investigation reveals that current debates over metrics, pre-training loss thresholds, and in-context learning miss the fundamental ontological nature of emergence in DNNs. I argue that these systems exhibit genuine emergent properties analogous to those found in other complex natural phenomena, where systemic capabilities emerge from cooperative interactions among simple components without being reducible to their individual behaviours. The paper concludes that understanding LLM capabilities requires recognising DNNs as a new domain of complex dynamical systems governed by universal principles of emergence, similar to those operating in physics, chemistry, and biology. This perspective shifts the focus from purely phenomenological definitions of emergence to understanding the internal dynamic transformations that enable these systems to acquire capabilities that transcend their individual components.

Related papers

State Space Models Naturally Produce Traveling Waves, Time Cells, and Scale to Abstract Cognitive Functions [7.097247619177705]
We propose a framework based on State-Space Models (SSMs), an emerging class of deep learning architectures.<n>We demonstrate that the model spontaneously develops neural representations that strikingly mimic biological 'time cells'<n>Our findings position SSMs as a compelling framework that connects single-neuron dynamics to cognitive phenomena.
arXiv Detail & Related papers (2025-07-18T03:53:16Z)
Continuum-Interaction-Driven Intelligence: Human-Aligned Neural Architecture via Crystallized Reasoning and Fluid Generation [1.5800607910450124]
Current AI systems face challenges including hallucination, unpredictability, and misalignment with human decision-making.<n>This study proposes a dual-channel intelligent architecture that integrates probabilistic generation (LLMs) with white-box procedural reasoning (chain-of-thought) to construct interpretable, continuously learnable, and human-aligned AI systems.
arXiv Detail & Related papers (2025-04-12T18:15:49Z)
Transformer Dynamics: A neuroscientific approach to interpretability of large language models [0.0]
We focus on the residual stream (RS) in transformer models, conceptualizing it as a dynamical system evolving across layers.<n>We find that activations of individual RS units exhibit strong continuity across layers, despite the RS being a non-privileged basis.<n>In reduced-dimensional spaces, the RS follows a curved trajectory with attractor-like dynamics in the lower layers.
arXiv Detail & Related papers (2025-02-17T18:49:40Z)
Parameter Symmetry Potentially Unifies Deep Learning Theory [2.0383173745487198]
We advocate for the role of the research direction of parameter symmetries in unifying AI theories.<n>We argue that this direction of research could lead to a unified understanding of three distinct hierarchies in neural networks.
arXiv Detail & Related papers (2025-02-07T20:10:05Z)
Discovering Chunks in Neural Embeddings for Interpretability [53.80157905839065]
We propose leveraging the principle of chunking to interpret artificial neural population activities.<n>We first demonstrate this concept in recurrent neural networks (RNNs) trained on artificial sequences with imposed regularities.<n>We identify similar recurring embedding states corresponding to concepts in the input, with perturbations to these states activating or inhibiting the associated concepts.
arXiv Detail & Related papers (2025-02-03T20:30:46Z)
Evolving Neural Networks Reveal Emergent Collective Behavior from Minimal Agent Interactions [0.0]
We investigate how neural networks evolve to control agents' behavior in a dynamic environment. Simpler behaviors, such as lane formation and laminar flow, are characterized by more linear network operations. Specific environmental parameters, such as moderate noise, broader field of view, and lower agent density, promote the evolution of non-linear networks.
arXiv Detail & Related papers (2024-10-25T17:43:00Z)
Artificial Kuramoto Oscillatory Neurons [65.16453738828672]
It has long been known in both neuroscience and AI that ''binding'' between neurons leads to a form of competitive learning where representations are compressed in order to represent more abstract concepts in deeper layers of the network.<n>We introduce Artificial rethinking together with arbitrary connectivity designs such as fully connected convolutional, or attentive mechanisms.<n>We show that this idea provides performance improvements across a wide spectrum of tasks such as unsupervised object discovery, adversarial robustness, uncertainty, quantification, and reasoning.
arXiv Detail & Related papers (2024-10-17T17:47:54Z)
Contrastive Learning in Memristor-based Neuromorphic Systems [55.11642177631929]
Spiking neural networks have become an important family of neuron-based models that sidestep many of the key limitations facing modern-day backpropagation-trained deep networks. In this work, we design and investigate a proof-of-concept instantiation of contrastive-signal-dependent plasticity (CSDP), a neuromorphic form of forward-forward-based, backpropagation-free learning.
arXiv Detail & Related papers (2024-09-17T04:48:45Z)
Non-linear classification capability of quantum neural networks due to emergent quantum metastability [0.0]
We show that effective non-linearities can be implemented in quantum neural networks. By using a quantum neural network whose architecture is inspired by dissipative many-body quantum spin models, we show that this mechanism indeed allows to realize non-linear data classification.
arXiv Detail & Related papers (2024-08-20T12:01:07Z)
Brain-Inspired Machine Intelligence: A Survey of Neurobiologically-Plausible Credit Assignment [65.268245109828]
We examine algorithms for conducting credit assignment in artificial neural networks that are inspired or motivated by neurobiology. We organize the ever-growing set of brain-inspired learning schemes into six general families and consider these in the context of backpropagation of errors. The results of this review are meant to encourage future developments in neuro-mimetic systems and their constituent learning processes.
arXiv Detail & Related papers (2023-12-01T05:20:57Z)
A Neuro-mimetic Realization of the Common Model of Cognition via Hebbian Learning and Free Energy Minimization [55.11642177631929]
Large neural generative models are capable of synthesizing semantically rich passages of text or producing complex images. We discuss the COGnitive Neural GENerative system, such an architecture that casts the Common Model of Cognition.
arXiv Detail & Related papers (2023-10-14T23:28:48Z)
Discrete, compositional, and symbolic representations through attractor dynamics [51.20712945239422]
We introduce a novel neural systems model that integrates attractor dynamics with symbolic representations to model cognitive processes akin to the probabilistic language of thought (PLoT) Our model segments the continuous representational space into discrete basins, with attractor states corresponding to symbolic sequences, that reflect the semanticity and compositionality characteristic of symbolic systems through unsupervised learning, rather than relying on pre-defined primitives. This approach establishes a unified framework that integrates both symbolic and sub-symbolic processing through neural dynamics, a neuroplausible substrate with proven expressivity in AI, offering a more comprehensive model that mirrors the complex duality of cognitive operations
arXiv Detail & Related papers (2023-10-03T05:40:56Z)
Gradient Starvation: A Learning Proclivity in Neural Networks [97.02382916372594]
Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task. This work provides a theoretical explanation for the emergence of such feature imbalance in neural networks.
arXiv Detail & Related papers (2020-11-18T18:52:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.