Related papers: Neural Networks as Universal Finite-State Machines: A Constructive Deterministic Finite Automaton Theory

Neural Networks as Universal Finite-State Machines: A Constructive Deterministic Finite Automaton Theory

URL: http://arxiv.org/abs/2505.11694v2
Date: Wed, 28 May 2025 21:39:10 GMT
Title: Neural Networks as Universal Finite-State Machines: A Constructive Deterministic Finite Automaton Theory
Authors: Sahil Rajesh Dhayalkar,
Abstract summary: We establish feedforward neural networks as universal finite-state machines (N-FSMs)<n>Our results prove that finite-depth ReLU and threshold networks can exactly simulate deterministic finite automata (DFAs)<n>We formalize the expressivity boundary: fixed-depth feedforward networks cannot recognize non-regular languages requiring memory.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present a complete theoretical and empirical framework establishing feedforward neural networks as universal finite-state machines (N-FSMs). Our results prove that finite-depth ReLU and threshold networks can exactly simulate deterministic finite automata (DFAs) by unrolling state transitions into depth-wise neural layers, with formal characterizations of required depth, width, and state compression. We demonstrate that DFA transitions are linearly separable, binary threshold activations allow exponential compression, and Myhill-Nerode equivalence classes can be embedded into continuous latent spaces while preserving separability. We also formalize the expressivity boundary: fixed-depth feedforward networks cannot recognize non-regular languages requiring unbounded memory. Unlike prior heuristic or probing-based studies, we provide constructive proofs and design explicit DFA-unrolled neural architectures that empirically validate every claim. Our results bridge deep learning, automata theory, and neural-symbolic computation, offering a rigorous blueprint for how discrete symbolic processes can be realized in continuous neural systems.

Related papers

Algorithm Development in Neural Networks: Insights from the Streaming Parity Task [8.188549368578704]
We study the learning dynamics of neural networks trained on a streaming parity task.<n>We show that, with sufficient finite training experience, RNNs exhibit a phase transition to perfect infinite generalization.<n>Our results disclose one mechanism by which neural networks can generalize infinitely from finite training experience.
arXiv Detail & Related papers (2025-07-14T04:07:43Z)
Why Neural Network Can Discover Symbolic Structures with Gradient-based Training: An Algebraic and Geometric Foundation for Neurosymbolic Reasoning [73.18052192964349]
We develop a theoretical framework that explains how discrete symbolic structures can emerge naturally from continuous neural network training dynamics.<n>By lifting neural parameters to a measure space and modeling training as Wasserstein gradient flow, we show that under geometric constraints, the parameter measure $mu_t$ undergoes two concurrent phenomena.
arXiv Detail & Related papers (2025-06-26T22:40:30Z)
Neural Networks as Universal Finite-State Machines: A Constructive Feedforward Simulation Framework for NFAs [0.0]
This work establishes a new bridge between symbolic automata theory and modern neural architectures.<n>We show that feedforward networks can perform precise, interpretable, and trainable symbolic computation.
arXiv Detail & Related papers (2025-05-30T01:18:35Z)
Global Convergence and Rich Feature Learning in $L$-Layer Infinite-Width Neural Networks under $μ$P Parametrization [66.03821840425539]
In this paper, we investigate the training dynamics of $L$-layer neural networks using the tensor gradient program (SGD) framework.<n>We show that SGD enables these networks to learn linearly independent features that substantially deviate from their initial values.<n>This rich feature space captures relevant data information and ensures that any convergent point of the training process is a global minimum.
arXiv Detail & Related papers (2025-03-12T17:33:13Z)
Discovering Chunks in Neural Embeddings for Interpretability [53.80157905839065]
We propose leveraging the principle of chunking to interpret artificial neural population activities.<n>We first demonstrate this concept in recurrent neural networks (RNNs) trained on artificial sequences with imposed regularities.<n>We identify similar recurring embedding states corresponding to concepts in the input, with perturbations to these states activating or inhibiting the associated concepts.
arXiv Detail & Related papers (2025-02-03T20:30:46Z)
Universal Scaling Laws of Absorbing Phase Transitions in Artificial Deep Neural Networks [0.8932296777085644]
Conventional artificial deep neural networks operating near the phase boundary of the signal propagation dynamics, also known as the edge of chaos, exhibit universal scaling laws of absorbing phase transitions.<n>We exploit the fully deterministic nature of the propagation dynamics to elucidate an analogy between a signal collapse in the neural networks and an absorbing state.
arXiv Detail & Related papers (2023-07-05T13:39:02Z)
Universal Approximation and the Topological Neural Network [0.0]
A topological neural network (TNN) takes data from a Tychonoff topological space instead of the usual finite dimensional space. A distributional neural network (DNN) that takes Borel measures as data is also introduced.
arXiv Detail & Related papers (2023-05-26T05:28:10Z)
Rank Diminishing in Deep Neural Networks [71.03777954670323]
Rank of neural networks measures information flowing across layers. It is an instance of a key structural condition that applies across broad domains of machine learning. For neural networks, however, the intrinsic mechanism that yields low-rank structures remains vague and unclear.
arXiv Detail & Related papers (2022-06-13T12:03:32Z)
NN-EUCLID: deep-learning hyperelasticity without stress data [0.0]
We propose a new approach for unsupervised learning of hyperelastic laws with physics-consistent deep neural networks. In contrast to supervised learning, which assumes the stress-strain, the approach only uses realistically measurable full-elastic field displacement and global force availability data.
arXiv Detail & Related papers (2022-05-04T13:54:54Z)
Machines of finite depth: towards a formalization of neural networks [0.0]
We provide a unifying framework where artificial neural networks and their architectures can be formally described as particular cases of a general mathematical construction--machines of finite depth. We prove this statement theoretically and practically, via a unified implementation that generalizes several classical architectures--dense, convolutional, and recurrent neural networks with a rich shortcut structure--and their respective backpropagation rules.
arXiv Detail & Related papers (2022-04-27T09:17:15Z)
Quasi-orthogonality and intrinsic dimensions as measures of learning and generalisation [55.80128181112308]
We show that dimensionality and quasi-orthogonality of neural networks' feature space may jointly serve as network's performance discriminants. Our findings suggest important relationships between the networks' final performance and properties of their randomly initialised feature spaces.
arXiv Detail & Related papers (2022-03-30T21:47:32Z)
Connecting Weighted Automata, Tensor Networks and Recurrent Neural Networks through Spectral Learning [58.14930566993063]
We present connections between three models used in different research fields: weighted finite automata(WFA) from formal languages and linguistics, recurrent neural networks used in machine learning, and tensor networks. We introduce the first provable learning algorithm for linear 2-RNN defined over sequences of continuous vectors input.
arXiv Detail & Related papers (2020-10-19T15:28:00Z)
A Chain Graph Interpretation of Real-World Neural Networks [58.78692706974121]
We propose an alternative interpretation that identifies NNs as chain graphs (CGs) and feed-forward as an approximate inference procedure. The CG interpretation specifies the nature of each NN component within the rich theoretical framework of probabilistic graphical models. We demonstrate with concrete examples that the CG interpretation can provide novel theoretical support and insights for various NN techniques.
arXiv Detail & Related papers (2020-06-30T14:46:08Z)
Scalable Partial Explainability in Neural Networks via Flexible Activation Functions [13.71739091287644]
High dimensional features and decisions given by deep neural networks (NN) require new algorithms and methods to expose its mechanisms. Current state-of-the-art NN interpretation methods focus more on the direct relationship between NN outputs and inputs rather than the NN structure and operations itself. In this paper, we achieve partially explainable learning model by symbolically explaining the role of activation functions (AF) under a scalable topology.
arXiv Detail & Related papers (2020-06-10T20:30:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.