CLIMB: Curriculum Learning for Infant-inspired Model Building
- URL: http://arxiv.org/abs/2311.08886v1
- Date: Wed, 15 Nov 2023 11:48:16 GMT
- Title: CLIMB: Curriculum Learning for Infant-inspired Model Building
- Authors: Richard Diehl Martinez, Zebulon Goriely, Hope McGovern, Christopher
Davis, Andrew Caines, Paula Buttery, Lisa Beinborn
- Abstract summary: We describe our team's contribution to the STRICT-SMALL track of the BabyLM Challenge.
The challenge requires training a language model from scratch using only a relatively small training dataset of ten million words.
We experiment with three variants of cognitively-motivated curriculum learning and analyze their effect on the performance of the model.
- Score: 6.4766496232839685
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We describe our team's contribution to the STRICT-SMALL track of the BabyLM
Challenge. The challenge requires training a language model from scratch using
only a relatively small training dataset of ten million words. We experiment
with three variants of cognitively-motivated curriculum learning and analyze
their effect on the performance of the model on linguistic evaluation tasks. In
the vocabulary curriculum, we analyze methods for constraining the vocabulary
in the early stages of training to simulate cognitively more plausible learning
curves. In the data curriculum experiments, we vary the order of the training
instances based on i) infant-inspired expectations and ii) the learning
behavior of the model. In the objective curriculum, we explore different
variations of combining the conventional masked language modeling task with a
more coarse-grained word class prediction task to reinforce linguistic
generalization capabilities. Our results did not yield consistent improvements
over our own non-curriculum learning baseline across a range of linguistic
benchmarks; however, we do find marginal gains on select tasks. Our analysis
highlights key takeaways for specific combinations of tasks and settings which
benefit from our proposed curricula. We moreover determine that careful
selection of model architecture, and training hyper-parameters yield
substantial improvements over the default baselines provided by the BabyLM
challenge.
Related papers
- Less is More: Pre-Training Cross-Lingual Small-Scale Language Models with Cognitively-Plausible Curriculum Learning Strategies [2.6684726101845]
We assess whether linguistic acquisition theories can be used to specify more fine-grained curriculum learning strategies.
We create age-ordered corpora of Child-Directed Speech for four typologically distant language families to implement SSLMs and acquisition-inspired curricula cross-lingually.
arXiv Detail & Related papers (2024-10-30T10:31:54Z) - KidLM: Advancing Language Models for Children -- Early Insights and Future Directions [7.839083566878183]
We introduce a novel user-centric data collection pipeline that involves gathering and validating a corpus specifically written for and sometimes by children.
We propose a new training objective, Stratified Masking, which dynamically adjusts masking probabilities based on our domain-specific child language data.
Experimental evaluations demonstrate that our model excels in understanding lower grade-level text, maintains safety by avoiding stereotypes, and captures children's unique preferences.
arXiv Detail & Related papers (2024-10-04T19:35:44Z) - Scalable Language Model with Generalized Continual Learning [58.700439919096155]
The Joint Adaptive Re-ization (JARe) is integrated with Dynamic Task-related Knowledge Retrieval (DTKR) to enable adaptive adjustment of language models based on specific downstream tasks.
Our method demonstrates state-of-the-art performance on diverse backbones and benchmarks, achieving effective continual learning in both full-set and few-shot scenarios with minimal forgetting.
arXiv Detail & Related papers (2024-04-11T04:22:15Z) - Pre-Training to Learn in Context [138.0745138788142]
The ability of in-context learning is not fully exploited because language models are not explicitly trained to learn in context.
We propose PICL (Pre-training for In-Context Learning), a framework to enhance the language models' in-context learning ability.
Our experiments show that PICL is more effective and task-generalizable than a range of baselines, outperforming larger language models with nearly 4x parameters.
arXiv Detail & Related papers (2023-05-16T03:38:06Z) - LERT: A Linguistically-motivated Pre-trained Language Model [67.65651497173998]
We propose LERT, a pre-trained language model that is trained on three types of linguistic features along with the original pre-training task.
We carried out extensive experiments on ten Chinese NLU tasks, and the experimental results show that LERT could bring significant improvements.
arXiv Detail & Related papers (2022-11-10T05:09:16Z) - Forging Multiple Training Objectives for Pre-trained Language Models via
Meta-Learning [97.28779163988833]
Multiple pre-training objectives fill the vacancy of the understanding capability of single-objective language modeling.
We propose textitMOMETAS, a novel adaptive sampler based on meta-learning, which learns the latent sampling pattern on arbitrary pre-training objectives.
arXiv Detail & Related papers (2022-10-19T04:38:26Z) - Analyzing the Limits of Self-Supervision in Handling Bias in Language [52.26068057260399]
We evaluate how well language models capture the semantics of four tasks for bias: diagnosis, identification, extraction and rephrasing.
Our analyses indicate that language models are capable of performing these tasks to widely varying degrees across different bias dimensions, such as gender and political affiliation.
arXiv Detail & Related papers (2021-12-16T05:36:08Z) - Curriculum learning for language modeling [2.2475845406292714]
Language models have proven transformational for the natural language processing community.
These models have proven expensive, energy-intensive, and challenging to train.
Curriculum learning is a method that employs a structured training regime instead.
arXiv Detail & Related papers (2021-08-04T16:53:43Z) - Pre-training Text Representations as Meta Learning [113.3361289756749]
We introduce a learning algorithm which directly optimize model's ability to learn text representations for effective learning of downstream tasks.
We show that there is an intrinsic connection between multi-task pre-training and model-agnostic meta-learning with a sequence of meta-train steps.
arXiv Detail & Related papers (2020-04-12T09:05:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.