Related papers: Statistically Meaningful Approximation: a Case Study on Approximating Turing Machines with Transformers

Statistically Meaningful Approximation: a Case Study on Approximating Turing Machines with Transformers

URL: http://arxiv.org/abs/2107.13163v3
Date: Thu, 30 Mar 2023 06:31:06 GMT
Title: Statistically Meaningful Approximation: a Case Study on Approximating Turing Machines with Transformers
Authors: Colin Wei, Yining Chen, Tengyu Ma
Abstract summary: This work proposes a formal definition of statistically meaningful (SM) approximation which requires the approximating network to exhibit good statistical learnability. We study SM approximation for two function classes: circuits and Turing machines.
Score: 50.85524803885483
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A common lens to theoretically study neural net architectures is to analyze the functions they can approximate. However, constructions from approximation theory may be unrealistic and therefore less meaningful. For example, a common unrealistic trick is to encode target function values using infinite precision. To address these issues, this work proposes a formal definition of statistically meaningful (SM) approximation which requires the approximating network to exhibit good statistical learnability. We study SM approximation for two function classes: boolean circuits and Turing machines. We show that overparameterized feedforward neural nets can SM approximate boolean circuits with sample complexity depending only polynomially on the circuit size, not the size of the network. In addition, we show that transformers can SM approximate Turing machines with computation time bounded by $T$ with sample complexity polynomial in the alphabet size, state space size, and $\log (T)$. We also introduce new tools for analyzing generalization which provide much tighter sample complexities than the typical VC-dimension or norm-based bounds, which may be of independent interest.

Related papers

Computational-Statistical Gaps in Gaussian Single-Index Models [77.1473134227844]
Single-Index Models are high-dimensional regression problems with planted structure. We show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $Omega(dkstar/2)$ samples.
arXiv Detail & Related papers (2024-03-08T18:50:19Z)
Universal Neural Functionals [67.80283995795985]
A challenging problem in many modern machine learning tasks is to process weight-space features. Recent works have developed promising weight-space models that are equivariant to the permutation symmetries of simple feedforward networks. This work proposes an algorithm that automatically constructs permutation equivariant models for any weight space.
arXiv Detail & Related papers (2024-02-07T20:12:27Z)
Auto-Regressive Next-Token Predictors are Universal Learners [17.416520406390415]
We show that even simple models such as linear next-token predictors can approximate any function efficiently computed by a Turing machine. We also show experimentally that simple next-token predictors, such as linear networks and shallow Multi-Layer Perceptrons (MLPs), display non-trivial performance on text generation and arithmetic tasks.
arXiv Detail & Related papers (2023-09-13T14:15:03Z)
Neural approximation of Wasserstein distance via a universal architecture for symmetric and factorwise group invariant functions [6.994580267603235]
We first present a general neural network architecture for approximating SFGI functions. The main contribution of this paper combines this general neural network with a sketching idea to develop a specific and efficient neural network. Our work provides an interesting integration of sketching ideas for geometric problems with universal approximation of symmetric functions.
arXiv Detail & Related papers (2023-08-01T04:11:19Z)
Transformers Learn Shortcuts to Automata [52.015990420075944]
We find that a low-depth Transformer can represent the computations of any finite-state automaton. We show that a Transformer with $O(log T)$ layers can exactly replicate the computation of an automaton on an input sequence of length $T$. We further investigate the brittleness of these solutions and propose potential mitigations.
arXiv Detail & Related papers (2022-10-19T17:45:48Z)
A Simple and General Debiased Machine Learning Theorem with Finite Sample Guarantees [4.55274575362193]
We provide a nonasymptotic debiased machine learning theorem that encompasses any global or local functional of any machine learning algorithm. Our results culminate in a simple set of conditions that an analyst can use to translate modern learning theory rates into traditional statistical inference.
arXiv Detail & Related papers (2021-05-31T17:57:02Z)
PAC-learning gains of Turing machines over circuits and neural networks [1.4502611532302039]
We study the potential gains in sample efficiency that can bring in the principle of minimum description length. We use Turing machines to represent universal models and circuits. We highlight close relationships between classical open problems in Circuit Complexity and the tightness of these.
arXiv Detail & Related papers (2021-03-23T17:03:10Z)
On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces [208.67848059021915]
We study the exploration-exploitation tradeoff at the core of reinforcement learning. In particular, we prove that the complexity of the function class $mathcalF$ characterizes the complexity of the function. Our regret bounds are independent of the number of episodes.
arXiv Detail & Related papers (2020-11-09T18:32:22Z)
Refined bounds for algorithm configuration: The knife-edge of dual class approximability [94.83809668933021]
We investigate how large should a training set be to ensure that a parameter's average metrics performance over the training set is close to its expected, future performance. We show that if this approximation holds under the L-infinity norm, we can provide strong sample complexity bounds. We empirically evaluate our bounds in the context of integer programming, one of the most powerful tools in computer science.
arXiv Detail & Related papers (2020-06-21T15:32:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.