When Do Curricula Work?
- URL: http://arxiv.org/abs/2012.03107v3
- Date: Tue, 9 Feb 2021 17:38:58 GMT
- Title: When Do Curricula Work?
- Authors: Xiaoxia Wu and Ethan Dyer and Behnam Neyshabur
- Abstract summary: ordered learning has been suggested as improvements to the standard i.i.d. training.
We conduct experiments over thousands of orderings spanning three kinds of learning: curriculum, anti-curriculum, and random-curriculum.
We find that curricula have only marginal benefits, and that randomly ordered samples perform as well or better than curricula and anti-curricula.
- Score: 26.072472732516335
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Inspired by human learning, researchers have proposed ordering examples
during training based on their difficulty. Both curriculum learning, exposing a
network to easier examples early in training, and anti-curriculum learning,
showing the most difficult examples first, have been suggested as improvements
to the standard i.i.d. training. In this work, we set out to investigate the
relative benefits of ordered learning. We first investigate the \emph{implicit
curricula} resulting from architectural and optimization bias and find that
samples are learned in a highly consistent order. Next, to quantify the benefit
of \emph{explicit curricula}, we conduct extensive experiments over thousands
of orderings spanning three kinds of learning: curriculum, anti-curriculum, and
random-curriculum -- in which the size of the training dataset is dynamically
increased over time, but the examples are randomly ordered. We find that for
standard benchmark datasets, curricula have only marginal benefits, and that
randomly ordered samples perform as well or better than curricula and
anti-curricula, suggesting that any benefit is entirely due to the dynamic
training set size. Inspired by common use cases of curriculum learning in
practice, we investigate the role of limited training time budget and noisy
data in the success of curriculum learning. Our experiments demonstrate that
curriculum, but not anti-curriculum can indeed improve the performance either
with limited training time budget or in existence of noisy data.
Related papers
- EfficientTrain++: Generalized Curriculum Learning for Efficient Visual Backbone Training [79.96741042766524]
We reformulate the training curriculum as a soft-selection function.
We show that exposing the contents of natural images can be readily achieved by the intensity of data augmentation.
The resulting method, EfficientTrain++, is simple, general, yet surprisingly effective.
arXiv Detail & Related papers (2024-05-14T17:00:43Z) - Curriculum Learning and Imitation Learning for Model-free Control on
Financial Time-series [0.6562256987706128]
Curriculum learning and imitation learning have been leveraged extensively in the robotics domain.
We theoretically and empirically explore these approaches in a representative control task over complex time-series data.
Our findings reveal that curriculum learning should be considered a novel direction in improving control-task performance.
arXiv Detail & Related papers (2023-11-22T11:42:50Z) - Reinforcement Learning with Success Induced Task Prioritization [68.8204255655161]
We introduce Success Induced Task Prioritization (SITP), a framework for automatic curriculum learning.
The algorithm selects the order of tasks that provide the fastest learning for agents.
We demonstrate that SITP matches or surpasses the results of other curriculum design methods.
arXiv Detail & Related papers (2022-12-30T12:32:43Z) - When Do Curricula Work in Federated Learning? [56.88941905240137]
We find that curriculum learning largely alleviates non-IIDness.
The more disparate the data distributions across clients the more they benefit from learning.
We propose a novel client selection technique that benefits from the real-world disparity in the clients.
arXiv Detail & Related papers (2022-12-24T11:02:35Z) - An Analytical Theory of Curriculum Learning in Teacher-Student Networks [10.303947049948107]
In humans and animals, curriculum learning is critical to rapid learning and effective pedagogy.
In machine learning, curricula are not widely used and empirically often yield only moderate benefits.
arXiv Detail & Related papers (2021-06-15T11:48:52Z) - Token-wise Curriculum Learning for Neural Machine Translation [94.93133801641707]
Existing curriculum learning approaches to Neural Machine Translation (NMT) require sufficient sampling amounts of "easy" samples from training data at the early training stage.
We propose a novel token-wise curriculum learning approach that creates sufficient amounts of easy samples.
Our approach can consistently outperform baselines on 5 language pairs, especially for low-resource languages.
arXiv Detail & Related papers (2021-03-20T03:57:59Z) - Statistical Measures For Defining Curriculum Scoring Function [5.328970912536596]
We show improvements in performance with convolutional and fully-connected neural networks on real image datasets.
Motivated by our insights from implicit curriculum ordering, we introduce a simple curriculum learning strategy.
We also propose and study the performance of a dynamic curriculum learning algorithm.
arXiv Detail & Related papers (2021-02-27T07:25:49Z) - Curriculum Learning: A Survey [65.31516318260759]
Curriculum learning strategies have been successfully employed in all areas of machine learning.
We construct a taxonomy of curriculum learning approaches by hand, considering various classification criteria.
We build a hierarchical tree of curriculum learning methods using an agglomerative clustering algorithm.
arXiv Detail & Related papers (2021-01-25T20:08:32Z) - Fine-Tuning Pretrained Language Models: Weight Initializations, Data
Orders, and Early Stopping [62.78338049381917]
Fine-tuning pretrained contextual word embedding models to supervised downstream tasks has become commonplace in natural language processing.
We experiment with four datasets from the GLUE benchmark, fine-tuning BERT hundreds of times on each while varying only the random seeds.
We find substantial performance increases compared to previously reported results, and we quantify how the performance of the best-found model varies as a function of the number of fine-tuning trials.
arXiv Detail & Related papers (2020-02-15T02:40:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.