Provable Advantage of Curriculum Learning on Parity Targets with Mixed
Inputs
- URL: http://arxiv.org/abs/2306.16921v1
- Date: Thu, 29 Jun 2023 13:14:42 GMT
- Title: Provable Advantage of Curriculum Learning on Parity Targets with Mixed
Inputs
- Authors: Emmanuel Abbe, Elisabetta Cornacchia, Aryo Lotfi
- Abstract summary: We show a separation result in the number of training steps with standard (bounded) learning rates on a common sample distribution.
We also provide experimental results supporting the qualitative separation beyond the specific regime of the theoretical results.
- Score: 21.528321119061694
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Experimental results have shown that curriculum learning, i.e., presenting
simpler examples before more complex ones, can improve the efficiency of
learning. Some recent theoretical results also showed that changing the
sampling distribution can help neural networks learn parities, with formal
results only for large learning rates and one-step arguments. Here we show a
separation result in the number of training steps with standard (bounded)
learning rates on a common sample distribution: if the data distribution is a
mixture of sparse and dense inputs, there exists a regime in which a 2-layer
ReLU neural network trained by a curriculum noisy-GD (or SGD) algorithm that
uses sparse examples first, can learn parities of sufficiently large degree,
while any fully connected neural network of possibly larger width or depth
trained by noisy-GD on the unordered samples cannot learn without additional
steps. We also provide experimental results supporting the qualitative
separation beyond the specific regime of the theoretical results.
Related papers
- Provably Neural Active Learning Succeeds via Prioritizing Perplexing Samples [53.95282502030541]
Neural Network-based active learning (NAL) is a cost-effective data selection technique that utilizes neural networks to select and train on a small subset of samples.
We try to move one step forward by offering a unified explanation for the success of both query criteria-based NAL from a feature learning view.
arXiv Detail & Related papers (2024-06-06T10:38:01Z) - Probabilistic Contrastive Learning for Long-Tailed Visual Recognition [78.70453964041718]
Longtailed distributions frequently emerge in real-world data, where a large number of minority categories contain a limited number of samples.
Recent investigations have revealed that supervised contrastive learning exhibits promising potential in alleviating the data imbalance.
We propose a novel probabilistic contrastive (ProCo) learning algorithm that estimates the data distribution of the samples from each class in the feature space.
arXiv Detail & Related papers (2024-03-11T13:44:49Z) - Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and
Luck [35.6883212537938]
We consider offline sparse parity learning, a supervised classification problem which admits a statistical query lower bound for gradient-based training of a multilayer perceptron.
We show, theoretically and experimentally, that sparse initialization and increasing network width yield significant improvements in sample efficiency in this setting.
We also show that the synthetic sparse parity task can be useful as a proxy for real problems requiring axis-aligned feature learning.
arXiv Detail & Related papers (2023-09-07T15:52:48Z) - Computational Complexity of Learning Neural Networks: Smoothness and
Degeneracy [52.40331776572531]
We show that learning depth-$3$ ReLU networks under the Gaussian input distribution is hard even in the smoothed-analysis framework.
Our results are under a well-studied assumption on the existence of local pseudorandom generators.
arXiv Detail & Related papers (2023-02-15T02:00:26Z) - Joint Edge-Model Sparse Learning is Provably Efficient for Graph Neural
Networks [89.28881869440433]
This paper provides the first theoretical characterization of joint edge-model sparse learning for graph neural networks (GNNs)
It proves analytically that both sampling important nodes and pruning neurons with the lowest-magnitude can reduce the sample complexity and improve convergence without compromising the test accuracy.
arXiv Detail & Related papers (2023-02-06T16:54:20Z) - Neural networks trained with SGD learn distributions of increasing
complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics.
We then exploit higher-order statistics only later during training.
We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z) - BatchFormer: Learning to Explore Sample Relationships for Robust
Representation Learning [93.38239238988719]
We propose to enable deep neural networks with the ability to learn the sample relationships from each mini-batch.
BatchFormer is applied into the batch dimension of each mini-batch to implicitly explore sample relationships during training.
We perform extensive experiments on over ten datasets and the proposed method achieves significant improvements on different data scarcity applications.
arXiv Detail & Related papers (2022-03-03T05:31:33Z) - Multi-Sample Online Learning for Spiking Neural Networks based on
Generalized Expectation Maximization [42.125394498649015]
Spiking Neural Networks (SNNs) capture some of the efficiency of biological brains by processing through binary neural dynamic activations.
This paper proposes to leverage multiple compartments that sample independent spiking signals while sharing synaptic weights.
The key idea is to use these signals to obtain more accurate statistical estimates of the log-likelihood training criterion, as well as of its gradient.
arXiv Detail & Related papers (2021-02-05T16:39:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.