Related papers: Algorithm Development in Neural Networks: Insights from the Streaming Parity Task

Algorithm Development in Neural Networks: Insights from the Streaming Parity Task

URL: http://arxiv.org/abs/2507.09897v2
Date: Wed, 06 Aug 2025 13:21:56 GMT
Title: Algorithm Development in Neural Networks: Insights from the Streaming Parity Task
Authors: Loek van Rossem, Andrew M. Saxe,
Abstract summary: We study the learning dynamics of neural networks trained on a streaming parity task.<n>We show that, with sufficient finite training experience, RNNs exhibit a phase transition to perfect infinite generalization.<n>Our results disclose one mechanism by which neural networks can generalize infinitely from finite training experience.
Score: 8.188549368578704
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Even when massively overparameterized, deep neural networks show a remarkable ability to generalize. Research on this phenomenon has focused on generalization within distribution, via smooth interpolation. Yet in some settings neural networks also learn to extrapolate to data far beyond the bounds of the original training set, sometimes even allowing for infinite generalization, implying that an algorithm capable of solving the task has been learned. Here we undertake a case study of the learning dynamics of recurrent neural networks (RNNs) trained on the streaming parity task in order to develop an effective theory of algorithm development. The streaming parity task is a simple but nonlinear task defined on sequences up to arbitrary length. We show that, with sufficient finite training experience, RNNs exhibit a phase transition to perfect infinite generalization. Using an effective theory for the representational dynamics, we find an implicit representational merger effect which can be interpreted as the construction of a finite automaton that reproduces the task. Overall, our results disclose one mechanism by which neural networks can generalize infinitely from finite training experience.

Related papers

On the algorithmic construction of deep ReLU networks [0.0]
We take the perspective of a neural network as an algorithm.<n>In this analogy, a neural network is programmed constructively, rather than trained from data.<n>We construct and analyze several other examples, both existing and new.
arXiv Detail & Related papers (2025-06-23T20:35:52Z)
Coding schemes in neural networks learning classification tasks [52.22978725954347]
We investigate fully-connected, wide neural networks learning classification tasks. We show that the networks acquire strong, data-dependent features. Surprisingly, the nature of the internal representations depends crucially on the neuronal nonlinearity.
arXiv Detail & Related papers (2024-06-24T14:50:05Z)
How neural networks learn to classify chaotic time series [77.34726150561087]
We study the inner workings of neural networks trained to classify regular-versus-chaotic time series. We find that the relation between input periodicity and activation periodicity is key for the performance of LKCNN models.
arXiv Detail & Related papers (2023-06-04T08:53:27Z)
Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK [86.45209429863858]
We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime. We show that the neural networks possess a different limiting kernel which we call textitbias-generalized NTK We also study various properties of the neural networks with this new kernel.
arXiv Detail & Related papers (2023-01-01T02:11:39Z)
Neural networks trained with SGD learn distributions of increasing complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics. We then exploit higher-order statistics only later during training. We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z)
On the optimization and generalization of overparameterized implicit neural networks [25.237054775800164]
Implicit neural networks have become increasingly attractive in the machine learning community. We show that global convergence is guaranteed, even if only the implicit layer is trained. This paper investigates the generalization error for implicit neural networks.
arXiv Detail & Related papers (2022-09-30T16:19:46Z)
What can linearized neural networks actually say about generalization? [67.83999394554621]
In certain infinitely-wide neural networks, the neural tangent kernel (NTK) theory fully characterizes generalization. We show that the linear approximations can indeed rank the learning complexity of certain tasks for neural networks. Our work provides concrete examples of novel deep learning phenomena which can inspire future theoretical research.
arXiv Detail & Related papers (2021-06-12T13:05:11Z)
Learning for Integer-Constrained Optimization through Neural Networks with Limited Training [28.588195947764188]
We introduce a symmetric and decomposed neural network structure, which is fully interpretable in terms of the functionality of its constituent components. By taking advantage of the underlying pattern of the integer constraint, the introduced neural network offers superior generalization performance with limited training. We show that the introduced decomposed approach can be further extended to semi-decomposed frameworks.
arXiv Detail & Related papers (2020-11-10T21:17:07Z)
How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks [80.55378250013496]
We study how neural networks trained by gradient descent extrapolate what they learn outside the support of the training distribution. Graph Neural Networks (GNNs) have shown some success in more complex tasks.
arXiv Detail & Related papers (2020-09-24T17:48:59Z)
A Principle of Least Action for the Training of Neural Networks [10.342408668490975]
We show the presence of a low kinetic energy displacement bias in the transport map of the network, and link this bias with generalization performance. We propose a new learning algorithm, which automatically adapts to the complexity of the given task, and leads to networks with a high generalization ability even in low data regimes.
arXiv Detail & Related papers (2020-09-17T15:37:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.