Learning Parities with Neural Networks
- URL: http://arxiv.org/abs/2002.07400v2
- Date: Fri, 3 Jul 2020 11:38:47 GMT
- Title: Learning Parities with Neural Networks
- Authors: Amit Daniely, Eran Malach
- Abstract summary: We make a step towards showing leanability of models that are inherently non-linear.
We show that under certain distributions, sparse parities are learnable via gradient decent on depth-two network.
- Score: 45.6877715768796
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years we see a rapidly growing line of research which shows
learnability of various models via common neural network algorithms. Yet,
besides a very few outliers, these results show learnability of models that can
be learned using linear methods. Namely, such results show that learning
neural-networks with gradient-descent is competitive with learning a linear
classifier on top of a data-independent representation of the examples. This
leaves much to be desired, as neural networks are far more successful than
linear methods. Furthermore, on the more conceptual level, linear models don't
seem to capture the "deepness" of deep networks. In this paper we make a step
towards showing leanability of models that are inherently non-linear. We show
that under certain distributions, sparse parities are learnable via gradient
decent on depth-two network. On the other hand, under the same distributions,
these parities cannot be learned efficiently by linear methods.
Related papers
- Coding schemes in neural networks learning classification tasks [52.22978725954347]
We investigate fully-connected, wide neural networks learning classification tasks.
We show that the networks acquire strong, data-dependent features.
Surprisingly, the nature of the internal representations depends crucially on the neuronal nonlinearity.
arXiv Detail & Related papers (2024-06-24T14:50:05Z) - Scalable Bayesian Inference in the Era of Deep Learning: From Gaussian Processes to Deep Neural Networks [0.5827521884806072]
Large neural networks trained on large datasets have become the dominant paradigm in machine learning.
This thesis develops scalable methods to equip neural networks with model uncertainty.
arXiv Detail & Related papers (2024-04-29T23:38:58Z) - The Contextual Lasso: Sparse Linear Models via Deep Neural Networks [5.607237982617641]
We develop a new statistical estimator that fits a sparse linear model to the explanatory features such that the sparsity pattern and coefficients vary as a function of the contextual features.
An extensive suite of experiments on real and synthetic data suggests that the learned models, which remain highly transparent, can be sparser than the regular lasso.
arXiv Detail & Related papers (2023-02-02T05:00:29Z) - Slimmable Networks for Contrastive Self-supervised Learning [69.9454691873866]
Self-supervised learning makes significant progress in pre-training large models, but struggles with small models.
We introduce another one-stage solution to obtain pre-trained small models without the need for extra teachers.
A slimmable network consists of a full network and several weight-sharing sub-networks, which can be pre-trained once to obtain various networks.
arXiv Detail & Related papers (2022-09-30T15:15:05Z) - Transition to Linearity of Wide Neural Networks is an Emerging Property
of Assembling Weak Models [20.44438519046223]
Wide neural networks with linear output layer have been shown to be near-linear, and to have near-constant neural tangent kernel (NTK)
We show that the linearity of wide neural networks is, in fact, an emerging property of assembling a large number of diverse "weak" sub-models, none of which dominate the assembly.
arXiv Detail & Related papers (2022-03-10T01:27:01Z) - Leveraging Sparse Linear Layers for Debuggable Deep Networks [86.94586860037049]
We show how fitting sparse linear models over learned deep feature representations can lead to more debuggable neural networks.
The resulting sparse explanations can help to identify spurious correlations, explain misclassifications, and diagnose model biases in vision and language tasks.
arXiv Detail & Related papers (2021-05-11T08:15:25Z) - A Bayesian Perspective on Training Speed and Model Selection [51.15664724311443]
We show that a measure of a model's training speed can be used to estimate its marginal likelihood.
We verify our results in model selection tasks for linear models and for the infinite-width limit of deep neural networks.
Our results suggest a promising new direction towards explaining why neural networks trained with gradient descent are biased towards functions that generalize well.
arXiv Detail & Related papers (2020-10-27T17:56:14Z) - How Neural Networks Extrapolate: From Feedforward to Graph Neural
Networks [80.55378250013496]
We study how neural networks trained by gradient descent extrapolate what they learn outside the support of the training distribution.
Graph Neural Networks (GNNs) have shown some success in more complex tasks.
arXiv Detail & Related papers (2020-09-24T17:48:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.