Related papers: Understanding Learning Dynamics Through Structured Representations

Understanding Learning Dynamics Through Structured Representations

URL: http://arxiv.org/abs/2508.02126v1
Date: Mon, 04 Aug 2025 07:15:57 GMT
Title: Understanding Learning Dynamics Through Structured Representations
Authors: Saleh Nikooroo, Thomas Engel,
Abstract summary: This paper investigates how internal structural choices shape the behavior of learning systems.<n>We analyze how these structures influence gradient flow, spectral sensitivity, and fixed-point behavior.<n>Rather than prescribing fixed templates, we emphasize principles of tractable design that can steer learning behavior in interpretable ways.
Score: 1.2064681974642195
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: While modern deep networks have demonstrated remarkable versatility, their training dynamics remain poorly understood--often driven more by empirical tweaks than architectural insight. This paper investigates how internal structural choices shape the behavior of learning systems. Building on prior efforts that introduced simple architectural constraints, we explore the broader implications of structure for convergence, generalization, and adaptation. Our approach centers on a family of enriched transformation layers that incorporate constrained pathways and adaptive corrections. We analyze how these structures influence gradient flow, spectral sensitivity, and fixed-point behavior--uncovering mechanisms that contribute to training stability and representational regularity. Theoretical analysis is paired with empirical studies on synthetic and structured tasks, demonstrating improved robustness, smoother optimization, and scalable depth behavior. Rather than prescribing fixed templates, we emphasize principles of tractable design that can steer learning behavior in interpretable ways. Our findings support a growing view that architectural design is not merely a matter of performance tuning, but a critical axis for shaping learning dynamics in scalable and trustworthy neural systems.

Related papers

Cross-Model Semantics in Representation Learning [1.2064681974642195]
We show that structural regularities induce representational geometry that is more stable under architectural variation.<n>This suggests that certain forms of inductive bias not only support generalization within a model, but also improve the interoperability of learned features across models.
arXiv Detail & Related papers (2025-08-05T16:57:24Z)
Structured Transformations for Stable and Interpretable Neural Computation [1.2064681974642195]
We introduce a reformulation of layer-level transformations that departs from the standard unconstrained affine paradigm.<n>Our formulation encourages internal consistency and supports stable information flow across depth.<n>We show that models constructed with these structured transformations exhibit improved gradient conditioning, reduced sensitivity to perturbations, and layer-wise robustness.
arXiv Detail & Related papers (2025-07-31T19:26:45Z)
Toward Explainable Offline RL: Analyzing Representations in Intrinsically Motivated Decision Transformers [0.0]
Elastic Decision Transformers (EDTs) have proved to be particularly successful in offline reinforcement learning.<n>Recent research has shown that incorporating intrinsic motivation mechanisms into EDTs improves performance across exploration tasks.<n>We introduce a systematic post-hoc explainability framework to analyze how intrinsic motivation shapes learned embeddings in EDTs.
arXiv Detail & Related papers (2025-06-16T20:01:24Z)
Model Hemorrhage and the Robustness Limits of Large Language Models [119.46442117681147]
Large language models (LLMs) demonstrate strong performance across natural language processing tasks, yet undergo significant performance degradation when modified for deployment.<n>We define this phenomenon as model hemorrhage - performance decline caused by parameter alterations and architectural changes.
arXiv Detail & Related papers (2025-03-31T10:16:03Z)
An Overview of Low-Rank Structures in the Training and Adaptation of Large Models [52.67110072923365]
Recent research has uncovered a widespread phenomenon in deep networks: the emergence of low-rank structures.<n>These implicit low-dimensional patterns provide valuable insights for improving the efficiency of training and fine-tuning large-scale models.<n>We present a comprehensive review of advances in exploiting low-rank structures for deep learning and shed light on their mathematical foundations.
arXiv Detail & Related papers (2025-03-25T17:26:09Z)
Network Dynamics-Based Framework for Understanding Deep Neural Networks [11.44947569206928]
We propose a theoretical framework to analyze learning dynamics through the lens of dynamical systems theory.<n>We redefine the notions of linearity and nonlinearity in neural networks by introducing two fundamental transformation units at the neuron level.<n>Different transformation modes lead to distinct collective behaviors in weight vector organization, different modes of information extraction, and the emergence of qualitatively different learning phases.
arXiv Detail & Related papers (2025-01-05T04:23:21Z)
Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond [61.18736646013446]
In pursuit of a deeper understanding of its surprising behaviors, we investigate the utility of a simple yet accurate model of a trained neural network. Across three case studies, we illustrate how it can be applied to derive new empirical insights on a diverse range of prominent phenomena.
arXiv Detail & Related papers (2024-10-31T22:54:34Z)
The Buffer Mechanism for Multi-Step Information Reasoning in Language Models [52.77133661679439]
Investigating internal reasoning mechanisms of large language models can help us design better model architectures and training strategies. In this study, we constructed a symbolic dataset to investigate the mechanisms by which Transformer models employ vertical thinking strategy. We proposed a random matrix-based algorithm to enhance the model's reasoning ability, resulting in a 75% reduction in the training time required for the GPT-2 model.
arXiv Detail & Related papers (2024-05-24T07:41:26Z)
Latent Traversals in Generative Models as Potential Flows [113.4232528843775]
We propose to model latent structures with a learned dynamic potential landscape. Inspired by physics, optimal transport, and neuroscience, these potential landscapes are learned as physically realistic partial differential equations. Our method achieves both more qualitatively and quantitatively disentangled trajectories than state-of-the-art baselines.
arXiv Detail & Related papers (2023-04-25T15:53:45Z)
The Neural Race Reduction: Dynamics of Abstraction in Gated Networks [12.130628846129973]
We introduce the Gated Deep Linear Network framework that schematizes how pathways of information flow impact learning dynamics. We derive an exact reduction and, for certain cases, exact solutions to the dynamics of learning. Our work gives rise to general hypotheses relating neural architecture to learning and provides a mathematical approach towards understanding the design of more complex architectures.
arXiv Detail & Related papers (2022-07-21T12:01:03Z)
Recent advances in deep learning theory [104.01582662336256]
This paper reviews and organizes the recent advances in deep learning theory. The literature is categorized in six groups: (1) complexity and capacity-based approaches for analysing the generalizability of deep learning; (2) differential equations and their dynamic systems for modelling gradient descent and its variants; (3) the geometrical structures of the loss landscape that drives the trajectories of the dynamic systems; and (5) theoretical foundations of several special structures in network architectures.
arXiv Detail & Related papers (2020-12-20T14:16:41Z)
Gradient Starvation: A Learning Proclivity in Neural Networks [97.02382916372594]
Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task. This work provides a theoretical explanation for the emergence of such feature imbalance in neural networks.
arXiv Detail & Related papers (2020-11-18T18:52:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.