An Analytical Theory of Curriculum Learning in Teacher-Student Networks
- URL: http://arxiv.org/abs/2106.08068v1
- Date: Tue, 15 Jun 2021 11:48:52 GMT
- Title: An Analytical Theory of Curriculum Learning in Teacher-Student Networks
- Authors: Luca Saglietti, Stefano Sarao Mannelli, and Andrew Saxe
- Abstract summary: In humans and animals, curriculum learning is critical to rapid learning and effective pedagogy.
In machine learning, curricula are not widely used and empirically often yield only moderate benefits.
- Score: 10.303947049948107
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In humans and animals, curriculum learning -- presenting data in a curated
order - is critical to rapid learning and effective pedagogy. Yet in machine
learning, curricula are not widely used and empirically often yield only
moderate benefits. This stark difference in the importance of curriculum raises
a fundamental theoretical question: when and why does curriculum learning help?
In this work, we analyse a prototypical neural network model of curriculum
learning in the high-dimensional limit, employing statistical physics methods.
Curricula could in principle change both the learning speed and asymptotic
performance of a model. To study the former, we provide an exact description of
the online learning setting, confirming the long-standing experimental
observation that curricula can modestly speed up learning. To study the latter,
we derive performance in a batch learning setting, in which a network trains to
convergence in successive phases of learning on dataset slices of varying
difficulty. With standard training losses, curriculum does not provide
generalisation benefit, in line with empirical observations. However, we show
that by connecting different learning phases through simple Gaussian priors,
curriculum can yield a large improvement in test performance. Taken together,
our reduced analytical descriptions help reconcile apparently conflicting
empirical results and trace regimes where curriculum learning yields the
largest gains. More broadly, our results suggest that fully exploiting a
curriculum may require explicit changes to the loss function at curriculum
boundaries.
Related papers
- Normalization and effective learning rates in reinforcement learning [52.59508428613934]
Normalization layers have recently experienced a renaissance in the deep reinforcement learning and continual learning literature.
We show that normalization brings with it a subtle but important side effect: an equivalence between growth in the norm of the network parameters and decay in the effective learning rate.
We propose to make the learning rate schedule explicit with a simple re- parameterization which we call Normalize-and-Project.
arXiv Detail & Related papers (2024-07-01T20:58:01Z) - Efficient Mitigation of Bus Bunching through Setter-Based Curriculum Learning [0.47518865271427785]
We propose a novel approach to curriculum learning that uses a Setter Model to automatically generate an action space, adversary strength, and bunching strength.
Our method for automated curriculum learning involves a curriculum that is dynamically chosen and learned by an adversary network.
arXiv Detail & Related papers (2024-05-23T18:26:55Z) - When Do Curricula Work in Federated Learning? [56.88941905240137]
We find that curriculum learning largely alleviates non-IIDness.
The more disparate the data distributions across clients the more they benefit from learning.
We propose a novel client selection technique that benefits from the real-world disparity in the clients.
arXiv Detail & Related papers (2022-12-24T11:02:35Z) - EfficientTrain: Exploring Generalized Curriculum Learning for Training
Visual Backbones [80.662250618795]
This paper presents a new curriculum learning approach for the efficient training of visual backbones (e.g., vision Transformers)
As an off-the-shelf method, it reduces the wall-time training cost of a wide variety of popular models by >1.5x on ImageNet-1K/22K without sacrificing accuracy.
arXiv Detail & Related papers (2022-11-17T17:38:55Z) - Comparison and Analysis of New Curriculum Criteria for End-to-End ASR [10.698093106994804]
Curriculum Learning is built on the observation that organized and structured assimilation of knowledge has the ability to enable faster training and better comprehension.
We employ Curriculum Learning in the context of Automatic Speech Recognition.
To impose structure on the training set, we explored multiple scoring functions that either use feedback from an external neural network or incorporate feedback from the model itself.
arXiv Detail & Related papers (2022-08-10T06:56:58Z) - Online Continual Learning with Natural Distribution Shifts: An Empirical
Study with Visual Data [101.6195176510611]
"Online" continual learning enables evaluating both information retention and online learning efficacy.
In online continual learning, each incoming small batch of data is first used for testing and then added to the training set, making the problem truly online.
We introduce a new benchmark for online continual visual learning that exhibits large scale and natural distribution shifts.
arXiv Detail & Related papers (2021-08-20T06:17:20Z) - Analyzing Curriculum Learning for Sentiment Analysis along Task
Difficulty, Pacing and Visualization Axes [7.817598216459955]
We analyze curriculum learning in sentiment analysis along multiple axes.
We find that curriculum learning works best for difficult tasks and may even lead to a decrement in performance for tasks that have higher performance without curriculum learning.
arXiv Detail & Related papers (2021-02-19T15:42:14Z) - Curriculum Learning: A Survey [65.31516318260759]
Curriculum learning strategies have been successfully employed in all areas of machine learning.
We construct a taxonomy of curriculum learning approaches by hand, considering various classification criteria.
We build a hierarchical tree of curriculum learning methods using an agglomerative clustering algorithm.
arXiv Detail & Related papers (2021-01-25T20:08:32Z) - When Do Curricula Work? [26.072472732516335]
ordered learning has been suggested as improvements to the standard i.i.d. training.
We conduct experiments over thousands of orderings spanning three kinds of learning: curriculum, anti-curriculum, and random-curriculum.
We find that curricula have only marginal benefits, and that randomly ordered samples perform as well or better than curricula and anti-curricula.
arXiv Detail & Related papers (2020-12-05T19:41:30Z) - The large learning rate phase of deep learning: the catapult mechanism [50.23041928811575]
We present a class of neural networks with solvable training dynamics.
We find good agreement between our model's predictions and training dynamics in realistic deep learning settings.
We believe our results shed light on characteristics of models trained at different learning rates.
arXiv Detail & Related papers (2020-03-04T17:52:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.