Training Dynamics for Curriculum Learning: A Study on Monolingual and
Cross-lingual NLU
- URL: http://arxiv.org/abs/2210.12499v1
- Date: Sat, 22 Oct 2022 17:10:04 GMT
- Title: Training Dynamics for Curriculum Learning: A Study on Monolingual and
Cross-lingual NLU
- Authors: Fenia Christopoulou, Gerasimos Lampouras, Ignacio Iacobacci
- Abstract summary: Curriculum Learning (CL) is a technique of training models via ranking examples in a typically increasing difficulty trend.
In this work, we employ CL for Natural Language Understanding (NLU) tasks by taking advantage of training dynamics as difficulty metrics.
Experiments indicate that training dynamics can lead to better performing models with smoother training compared to other difficulty metrics.
- Score: 19.42920238320109
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Curriculum Learning (CL) is a technique of training models via ranking
examples in a typically increasing difficulty trend with the aim of
accelerating convergence and improving generalisability. Current approaches for
Natural Language Understanding (NLU) tasks use CL to improve in-distribution
data performance often via heuristic-oriented or task-agnostic difficulties. In
this work, instead, we employ CL for NLU by taking advantage of training
dynamics as difficulty metrics, i.e., statistics that measure the behavior of
the model at hand on specific task-data instances during training and propose
modifications of existing CL schedulers based on these statistics. Differently
from existing works, we focus on evaluating models on in-distribution (ID),
out-of-distribution (OOD) as well as zero-shot (ZS) cross-lingual transfer
datasets. We show across several NLU tasks that CL with training dynamics can
result in better performance mostly on zero-shot cross-lingual transfer and OOD
settings with improvements up by 8.5% in certain cases. Overall, experiments
indicate that training dynamics can lead to better performing models with
smoother training compared to other difficulty metrics while being 20% faster
on average. In addition, through analysis we shed light on the correlations of
task-specific versus task-agnostic metrics.
Related papers
- Investigating the Pre-Training Dynamics of In-Context Learning: Task Recognition vs. Task Learning [99.05401042153214]
In-context learning (ICL) is potentially attributed to two major abilities: task recognition (TR) and task learning (TL)
We take the first step by examining the pre-training dynamics of the emergence of ICL.
We propose a simple yet effective method to better integrate these two abilities for ICL at inference time.
arXiv Detail & Related papers (2024-06-20T06:37:47Z) - Uncertainty Aware Learning for Language Model Alignment [97.36361196793929]
We propose uncertainty-aware learning (UAL) to improve the model alignment of different task scenarios.
We implement UAL in a simple fashion -- adaptively setting the label smoothing value of training according to the uncertainty of individual samples.
Experiments on widely used benchmarks demonstrate that our UAL significantly and consistently outperforms standard supervised fine-tuning.
arXiv Detail & Related papers (2024-06-07T11:37:45Z) - How does Multi-Task Training Affect Transformer In-Context Capabilities? Investigations with Function Classes [6.652837942112205]
Large language models (LLM) have recently shown the extraordinary ability to perform unseen tasks based on few-shot examples provided as text.
We propose several effective curriculum learning strategies that allow ICL models to achieve higher data efficiency and more stable convergence.
Our experiments reveal that ICL models can effectively learn difficult tasks by training on progressively harder tasks while mixing in prior tasks, denoted as mixed curriculum in this work.
arXiv Detail & Related papers (2024-04-04T16:15:23Z) - On Task Performance and Model Calibration with Supervised and
Self-Ensembled In-Context Learning [71.44986275228747]
In-context learning (ICL) has become an efficient approach propelled by the recent advancements in large language models (LLMs)
However, both paradigms are prone to suffer from the critical problem of overconfidence (i.e., miscalibration)
arXiv Detail & Related papers (2023-12-21T11:55:10Z) - TRACE: A Comprehensive Benchmark for Continual Learning in Large
Language Models [52.734140807634624]
Aligned large language models (LLMs) demonstrate exceptional capabilities in task-solving, following instructions, and ensuring safety.
Existing continual learning benchmarks lack sufficient challenge for leading aligned LLMs.
We introduce TRACE, a novel benchmark designed to evaluate continual learning in LLMs.
arXiv Detail & Related papers (2023-10-10T16:38:49Z) - ALP: Action-Aware Embodied Learning for Perception [60.64801970249279]
We introduce Action-Aware Embodied Learning for Perception (ALP)
ALP incorporates action information into representation learning through a combination of optimizing a reinforcement learning policy and an inverse dynamics prediction objective.
We show that ALP outperforms existing baselines in several downstream perception tasks.
arXiv Detail & Related papers (2023-06-16T21:51:04Z) - Concept-aware Training Improves In-context Learning Ability of Language
Models [0.0]
Many recent language models (LMs) of Transformers family exhibit so-called in-context learning (ICL) ability.
We propose a method to create LMs able to better utilize the in-context information.
We measure that data sampling of Concept-aware Training consistently improves models' reasoning ability.
arXiv Detail & Related papers (2023-05-23T07:44:52Z) - Data Curation Alone Can Stabilize In-context Learning [20.874674130060388]
In-context learning (ICL) enables large language models to perform new tasks by prompting them with a sequence of training examples.
randomly sampling examples from a training set leads to high variance in performance.
We show that carefully curating a subset of training data greatly stabilizes ICL performance without any other changes to the ICL algorithm.
arXiv Detail & Related papers (2022-12-20T15:58:54Z) - MetaICL: Learning to Learn In Context [87.23056864536613]
We introduce MetaICL, a new meta-training framework for few-shot learning where a pretrained language model is tuned to do in-context learn-ing on a large set of training tasks.
We show that MetaICL approaches (and sometimes beats) the performance of models fully finetuned on the target task training data, and outperforms much bigger models with nearly 8x parameters.
arXiv Detail & Related papers (2021-10-29T17:42:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.