PAC-learning gains of Turing machines over circuits and neural networks
- URL: http://arxiv.org/abs/2103.12686v1
- Date: Tue, 23 Mar 2021 17:03:10 GMT
- Title: PAC-learning gains of Turing machines over circuits and neural networks
- Authors: Brieuc Pinon and Jean-Charles Delvenne and Rapha\"el Jungers
- Abstract summary: We study the potential gains in sample efficiency that can bring in the principle of minimum description length.
We use Turing machines to represent universal models and circuits.
We highlight close relationships between classical open problems in Circuit Complexity and the tightness of these.
- Score: 1.4502611532302039
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A caveat to many applications of the current Deep Learning approach is the
need for large-scale data. One improvement suggested by Kolmogorov Complexity
results is to apply the minimum description length principle with
computationally universal models. We study the potential gains in sample
efficiency that this approach can bring in principle. We use polynomial-time
Turing machines to represent computationally universal models and Boolean
circuits to represent Artificial Neural Networks (ANNs) acting on
finite-precision digits.
Our analysis unravels direct links between our question and Computational
Complexity results. We provide lower and upper bounds on the potential gains in
sample efficiency between the MDL applied with Turing machines instead of ANNs.
Our bounds depend on the bit-size of the input of the Boolean function to be
learned. Furthermore, we highlight close relationships between classical open
problems in Circuit Complexity and the tightness of these.
Related papers
- Learning Linear Attention in Polynomial Time [115.68795790532289]
We provide the first results on learnability of single-layer Transformers with linear attention.
We show that linear attention may be viewed as a linear predictor in a suitably defined RKHS.
We show how to efficiently identify training datasets for which every empirical riskr is equivalent to the linear Transformer.
arXiv Detail & Related papers (2024-10-14T02:41:01Z) - Symbolic Regression on FPGAs for Fast Machine Learning Inference [2.0920303420933273]
High-energy physics community is investigating the potential of deploying machine-learning-based solutions on Field-Programmable Gate Arrays (FPGAs)
We introduce a novel end-to-end procedure that utilizes a machine learning technique called symbolic regression (SR)
We show that our approach can approximate a 3-layer neural network using an inference model that achieves up to a 13-fold decrease in execution time, down to 5 ns, while still preserving more than 90% approximation accuracy.
arXiv Detail & Related papers (2023-05-06T17:04:02Z) - Pathfinding Neural Cellular Automata [23.831530224401575]
Pathfinding is an important sub-component of a broad range of complex AI tasks, such as robot path planning, transport routing, and game playing.
We hand-code and learn models for Breadth-First Search (BFS), i.e. shortest path finding.
We present a neural implementation of Depth-First Search (DFS), and outline how it can be combined with neural BFS to produce an NCA for computing diameter of a graph.
We experiment with architectural modifications inspired by these hand-coded NCAs, training networks from scratch to solve the diameter problem on grid mazes while exhibiting strong ability generalization
arXiv Detail & Related papers (2023-01-17T11:45:51Z) - Power and limitations of single-qubit native quantum neural networks [5.526775342940154]
Quantum neural networks (QNNs) have emerged as a leading strategy to establish applications in machine learning, chemistry, and optimization.
We formulate a theoretical framework for the expressive ability of data re-uploading quantum neural networks.
arXiv Detail & Related papers (2022-05-16T17:58:27Z) - Deep Architecture Connectivity Matters for Its Convergence: A
Fine-Grained Analysis [94.64007376939735]
We theoretically characterize the impact of connectivity patterns on the convergence of deep neural networks (DNNs) under gradient descent training.
We show that by a simple filtration on "unpromising" connectivity patterns, we can trim down the number of models to evaluate.
arXiv Detail & Related papers (2022-05-11T17:43:54Z) - Statistically Meaningful Approximation: a Case Study on Approximating
Turing Machines with Transformers [50.85524803885483]
This work proposes a formal definition of statistically meaningful (SM) approximation which requires the approximating network to exhibit good statistical learnability.
We study SM approximation for two function classes: circuits and Turing machines.
arXiv Detail & Related papers (2021-07-28T04:28:55Z) - On Function Approximation in Reinforcement Learning: Optimism in the
Face of Large State Spaces [208.67848059021915]
We study the exploration-exploitation tradeoff at the core of reinforcement learning.
In particular, we prove that the complexity of the function class $mathcalF$ characterizes the complexity of the function.
Our regret bounds are independent of the number of episodes.
arXiv Detail & Related papers (2020-11-09T18:32:22Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z) - Exposing Hardware Building Blocks to Machine Learning Frameworks [4.56877715768796]
We focus on how to design topologies that complement such a view of neurons as unique functions.
We develop a library that supports training a neural network with custom sparsity and quantization.
arXiv Detail & Related papers (2020-04-10T14:26:00Z) - Belief Propagation Reloaded: Learning BP-Layers for Labeling Problems [83.98774574197613]
We take one of the simplest inference methods, a truncated max-product Belief propagation, and add what is necessary to make it a proper component of a deep learning model.
This BP-Layer can be used as the final or an intermediate block in convolutional neural networks (CNNs)
The model is applicable to a range of dense prediction problems, is well-trainable and provides parameter-efficient and robust solutions in stereo, optical flow and semantic segmentation.
arXiv Detail & Related papers (2020-03-13T13:11:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.