Related papers: Sequential Group Composition: A Window into the Mechanics of Deep Learning

Sequential Group Composition: A Window into the Mechanics of Deep Learning

URL: http://arxiv.org/abs/2602.03655v1
Date: Tue, 03 Feb 2026 15:36:25 GMT
Title: Sequential Group Composition: A Window into the Mechanics of Deep Learning
Authors: Giovanni Luca Marchetti, Daniel Kunin, Adele Myers, Francisco Acosta, Nina Miolane,
Abstract summary: We introduce the sequential group composition task.<n>Networks learn this task one irreducible representation of the group at a time.<n>We show how deeper models exploit the associativity of the task to dramatically improve this scaling.
Score: 15.349155287234012
License: http://creativecommons.org/licenses/by/4.0/
Abstract: How do neural networks trained over sequences acquire the ability to perform structured operations, such as arithmetic, geometric, and algorithmic computation? To gain insight into this question, we introduce the sequential group composition task. In this task, networks receive a sequence of elements from a finite group encoded in a real vector space and must predict their cumulative product. The task can be order-sensitive and requires a nonlinear architecture to be learned. Our analysis isolates the roles of the group structure, encoding statistics, and sequence length in shaping learning. We prove that two-layer networks learn this task one irreducible representation of the group at a time in an order determined by the Fourier statistics of the encoding. These networks can perfectly learn the task, but doing so requires a hidden width exponential in the sequence length $k$. In contrast, we show how deeper models exploit the associativity of the task to dramatically improve this scaling: recurrent neural networks compose elements sequentially in $k$ steps, while multilayer networks compose adjacent pairs in parallel in $\log k$ layers. Overall, the sequential group composition task offers a tractable window into the mechanics of deep learning.

Related papers

Emergent Riemannian geometry over learning discrete computations on continuous manifolds [1.8665975431697432]
We show that signatures of discrete computations emerge in the representational geometry of neural networks as they learn.<n>We demonstrate how different learning regimes (rich vs. lazy) have contrasting metric and curvature structures, affecting the ability of the networks to generalise to unseen inputs.
arXiv Detail & Related papers (2025-11-28T20:29:06Z)
Deep Lookup Network [76.66809324649154]
In many resource-limited edge devices, complicated operations can be calculated via lookup tables to reduce computational cost.<n>We introduce a generic and efficient lookup operation which can be used as a basic operation for the construction of neural networks.<n>By replacing computationally expensive multiplication operations with our lookup operations, we develop lookup networks for the image classification, image super-resolution, and point cloud classification tasks.
arXiv Detail & Related papers (2025-09-17T03:31:41Z)
Scaling can lead to compositional generalization [6.654461784178872]
We show that scaling data and model size leads to compositional generalization.<n>We show that this holds across different task encodings as long as the training distribution sufficiently covers the task space.<n>We uncover that if networks successfully compositionally generalize, the constituents of a task can be linearly decoded from their hidden activations.
arXiv Detail & Related papers (2025-07-09T18:30:50Z)
Grokking Group Multiplication with Cosets [10.255744802963926]
Algorithmic tasks have proven to be a fruitful test ground for interpreting a neural network end-to-end. We completely reverse engineer fully connected one-hidden layer networks that have grokked'' the arithmetic of the permutation groups $S_5$ and $S_6$. We relate how we reverse engineered the model's mechanisms and confirm our theory was a faithful description of the circuit's functionality.
arXiv Detail & Related papers (2023-12-11T18:12:18Z)
Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks [69.38572074372392]
We present the first results proving that feature learning occurs during training with a nonlinear model on multiple tasks. Our key insight is that multi-task pretraining induces a pseudo-contrastive loss that favors representations that align points that typically have the same label across tasks.
arXiv Detail & Related papers (2023-07-13T16:39:08Z)
How Deep Neural Networks Learn Compositional Data: The Random Hierarchy Model [47.617093812158366]
We introduce the Random Hierarchy Model: a family of synthetic tasks inspired by the hierarchical structure of language and images. We find that deep networks learn the task by developing internal representations invariant to exchanging equivalent groups. Our results indicate how deep networks overcome the curse of dimensionality by building invariant representations.
arXiv Detail & Related papers (2023-07-05T09:11:09Z)
Fast Inference and Transfer of Compositional Task Structures for Few-shot Task Generalization [101.72755769194677]
We formulate it as a few-shot reinforcement learning problem where a task is characterized by a subtask graph. Our multi-task subtask graph inferencer (MTSGI) first infers the common high-level task structure in terms of the subtask graph from the training tasks. Our experiment results on 2D grid-world and complex web navigation domains show that the proposed method can learn and leverage the common underlying structure of the tasks for faster adaptation to the unseen tasks.
arXiv Detail & Related papers (2022-05-25T10:44:25Z)
Neural networks adapting to datasets: learning network size and topology [77.34726150561087]
We introduce a flexible setup allowing for a neural network to learn both its size and topology during the course of a gradient-based training. The resulting network has the structure of a graph tailored to the particular learning task and dataset.
arXiv Detail & Related papers (2020-06-22T12:46:44Z)
Online Sequential Extreme Learning Machines: Features Combined From Hundreds of Midlayers [0.0]
In this paper, we develop an algorithm called hierarchal online sequential learning algorithm (H-OS-ELM) The algorithm can learn chunk by chunk with fixed or varying block size.
arXiv Detail & Related papers (2020-06-12T00:50:04Z)
Adversarial Continual Learning [99.56738010842301]
We propose a hybrid continual learning framework that learns a disjoint representation for task-invariant and task-specific features. Our model combines architecture growth to prevent forgetting of task-specific skills and an experience replay approach to preserve shared skills.
arXiv Detail & Related papers (2020-03-21T02:08:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.