Cup Curriculum: Curriculum Learning on Model Capacity
- URL: http://arxiv.org/abs/2311.03956v1
- Date: Tue, 7 Nov 2023 12:55:31 GMT
- Title: Cup Curriculum: Curriculum Learning on Model Capacity
- Authors: Luca Scharr and Vanessa Toborek
- Abstract summary: Curriculum learning aims to increase the performance of a learner on a given task by applying a specialized learning strategy.
This strategy focuses on either the dataset, the task, or the model.
To close this gap, we propose the cup curriculum.
We empirically evaluate different strategies of the cup curriculum and show that it outperforms early stopping reliably while exhibiting a high resilience to overfitting.
- Score: 1.0878040851638
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Curriculum learning (CL) aims to increase the performance of a learner on a
given task by applying a specialized learning strategy. This strategy focuses
on either the dataset, the task, or the model. There is little to no work
analysing the possibilities to apply CL on the model capacity in natural
language processing. To close this gap, we propose the cup curriculum. In a
first phase of training we use a variation of iterative magnitude pruning to
reduce model capacity. These weights are reintroduced in a second phase,
resulting in the model capacity to show a cup-shaped curve over the training
iterations. We empirically evaluate different strategies of the cup curriculum
and show that it outperforms early stopping reliably while exhibiting a high
resilience to overfitting.
Related papers
- LLAVADI: What Matters For Multimodal Large Language Models Distillation [77.73964744238519]
In this work, we do not propose a new efficient model structure or train small-scale MLLMs from scratch.
Our studies involve training strategies, model choices, and distillation algorithms in the knowledge distillation process.
By evaluating different benchmarks and proper strategy, even a 2.7B small-scale model can perform on par with larger models with 7B or 13B parameters.
arXiv Detail & Related papers (2024-07-28T06:10:47Z) - CLIP with Generative Latent Replay: a Strong Baseline for Incremental Learning [17.614980614656407]
We propose Continual Generative training for Incremental prompt-Learning.
We exploit Variational Autoencoders to learn class-conditioned distributions.
We show that such a generative replay approach can adapt to new tasks while improving zero-shot capabilities.
arXiv Detail & Related papers (2024-07-22T16:51:28Z) - Examining Changes in Internal Representations of Continual Learning Models Through Tensor Decomposition [5.01338577379149]
Continual learning (CL) has spurred the development of several methods aimed at consolidating previous knowledge across sequential learning.
We propose a novel representation-based evaluation framework for CL models.
arXiv Detail & Related papers (2024-05-06T07:52:44Z) - Curriculum for Crowd Counting -- Is it Worthy? [2.462045767312954]
A notably intuitive technique called Curriculum Learning (CL) has been introduced recently for training deep learning models.
In this work, we investigate the impact of curriculum learning in crowd counting using the density estimation method.
Our experiments show that curriculum learning improves the model learning performance and shortens the convergence time.
arXiv Detail & Related papers (2024-01-15T10:46:01Z) - RanPAC: Random Projections and Pre-trained Models for Continual Learning [59.07316955610658]
Continual learning (CL) aims to learn different tasks (such as classification) in a non-stationary data stream without forgetting old ones.
We propose a concise and effective approach for CL with pre-trained models.
arXiv Detail & Related papers (2023-07-05T12:49:02Z) - Continual Learners are Incremental Model Generalizers [70.34479702177988]
This paper extensively studies the impact of Continual Learning (CL) models as pre-trainers.
We find that the transfer quality of the representation often increases gradually without noticeable degradation in fine-tuning performance.
We propose a new fine-tuning scheme, GLobal Attention Discretization (GLAD), that preserves rich task-generic representation during solving downstream tasks.
arXiv Detail & Related papers (2023-06-21T05:26:28Z) - Retrieval-Enhanced Contrastive Vision-Text Models [61.783728119255365]
We propose to equip vision-text models with the ability to refine their embedding with cross-modal retrieved information from a memory at inference time.
Remarkably, we show that this can be done with a light-weight, single-layer, fusion transformer on top of a frozen CLIP.
Our experiments validate that our retrieval-enhanced contrastive (RECO) training improves CLIP performance substantially on several challenging fine-grained tasks.
arXiv Detail & Related papers (2023-06-12T15:52:02Z) - FOSTER: Feature Boosting and Compression for Class-Incremental Learning [52.603520403933985]
Deep neural networks suffer from catastrophic forgetting when learning new categories.
We propose a novel two-stage learning paradigm FOSTER, empowering the model to learn new categories adaptively.
arXiv Detail & Related papers (2022-04-10T11:38:33Z) - Feeding What You Need by Understanding What You Learned [54.400455868448695]
Machine Reading (MRC) reveals the ability to understand a given text passage and answer questions based on it.
Existing research works in MRC rely heavily on large-size models and corpus to improve the performance evaluated by metrics such as Exact Match.
We argue that a deep understanding of model capabilities and data properties can help us feed a model with appropriate training data.
arXiv Detail & Related papers (2022-03-05T14:15:59Z) - Curriculum Meta-Learning for Few-shot Classification [1.5039745292757671]
We propose an adaptation of the curriculum training framework, applicable to state-of-the-art meta learning techniques for few-shot classification.
Our experiments with the MAML algorithm on two few-shot image classification tasks show significant gains with the curriculum training framework.
arXiv Detail & Related papers (2021-12-06T10:29:23Z) - Dynamic Data Selection for Curriculum Learning via Ability Estimation [6.255759848576057]
We propose replacing difficultys with learned difficulty parameters.
We also propose Dynamic selection for Curriculum Learning via Ability Estimation.
We show that models using learned difficulty and/or ability outperform data-based curriculum learning models on the GLUE classification tasks.
arXiv Detail & Related papers (2020-10-30T20:01:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.