Related papers: Less is More: Pre-Training Cross-Lingual Small-Scale Language Models with Cognitively-Plausible Curriculum Learning Strategies

Less is More: Pre-Training Cross-Lingual Small-Scale Language Models with Cognitively-Plausible Curriculum Learning Strategies

URL: http://arxiv.org/abs/2410.22886v1
Date: Wed, 30 Oct 2024 10:31:54 GMT
Title: Less is More: Pre-Training Cross-Lingual Small-Scale Language Models with Cognitively-Plausible Curriculum Learning Strategies
Authors: Suchir Salhan, Richard Diehl Martinez, Zébulon Goriely, Paula Buttery,
Abstract summary: We assess whether linguistic acquisition theories can be used to specify more fine-grained curriculum learning strategies. We create age-ordered corpora of Child-Directed Speech for four typologically distant language families to implement SSLMs and acquisition-inspired curricula cross-lingually.
Score: 2.6684726101845
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Curriculum Learning has been a popular strategy to improve the cognitive plausibility of Small-Scale Language Models (SSLMs) in the BabyLM Challenge. However, it has not led to considerable improvements over non-curriculum models. We assess whether theoretical linguistic acquisition theories can be used to specify more fine-grained curriculum learning strategies, creating age-ordered corpora of Child-Directed Speech for four typologically distant language families to implement SSLMs and acquisition-inspired curricula cross-lingually. Comparing the success of three objective curricula (Growing, Inwards and MMM) that precisely replicate the predictions of acquisition theories on a standard SSLM architecture, we find fine-grained acquisition-inspired curricula can outperform non-curriculum baselines and performance benefits of curricula strategies in SSLMs can be derived by specifying fine-grained language-specific curricula that precisely replicate language acquisition theories.

Related papers

Improving Multilingual Math Reasoning for African Languages [49.27985213689457]
We conduct experiments to evaluate different combinations of data types (translated versus synthetically generated), training stages (pre-training versus post-training), and other model adaptation configurations.<n>Our experiments focuses on mathematical reasoning tasks, using the Llama 3.1 model family as our base model.
arXiv Detail & Related papers (2025-05-26T11:35:01Z)
When Less Language is More: Language-Reasoning Disentanglement Makes LLMs Better Multilingual Reasoners [111.50503126693444]
We show that language-specific ablation consistently boosts multilingual reasoning performance.<n>Compared to post-training, our training-free ablation achieves comparable or superior results with minimal computational overhead.
arXiv Detail & Related papers (2025-05-21T08:35:05Z)
LLMCL-GEC: Advancing Grammatical Error Correction with LLM-Driven Curriculum Learning [44.010834543396165]
Large-scale language models (LLMs) have demonstrated remarkable capabilities in specific natural language processing (NLP) tasks. However, they may still lack proficiency compared to specialized models in certain domains, such as grammatical error correction (GEC)
arXiv Detail & Related papers (2024-12-17T05:09:07Z)
From Babbling to Fluency: Evaluating the Evolution of Language Models in Terms of Human Language Acquisition [6.617999710257379]
We propose a three-stage framework to assess the abilities of LMs. We evaluate the generative capacities of LMs using methods from linguistic research.
arXiv Detail & Related papers (2024-10-17T06:31:49Z)
MoE-CT: A Novel Approach For Large Language Models Training With Resistance To Catastrophic Forgetting [53.77590764277568]
We introduce a novel MoE-CT architecture that separates the base model's learning from the multilingual expansion process. Our design freezes the original LLM parameters, thus safeguarding its performance in high-resource languages, while an appended MoE module, trained on diverse language datasets, augments low-resource language proficiency.
arXiv Detail & Related papers (2024-06-25T11:03:45Z)
Exploring and Benchmarking the Planning Capabilities of Large Language Models [57.23454975238014]
This work lays the foundations for improving planning capabilities of large language models (LLMs) We construct a comprehensive benchmark suite encompassing both classical planning benchmarks and natural language scenarios. We investigate the use of many-shot in-context learning to enhance LLM planning, exploring the relationship between increased context length and improved planning performance.
arXiv Detail & Related papers (2024-06-18T22:57:06Z)
Bridging the Bosphorus: Advancing Turkish Large Language Models through Strategies for Low-Resource Language Adaptation and Benchmarking [1.3716808114696444]
Large Language Models (LLMs) are becoming crucial across various fields, emphasizing the urgency for high-quality models in underrepresented languages. This study explores the unique challenges faced by low-resource languages, such as data scarcity, model selection, evaluation, and computational limitations.
arXiv Detail & Related papers (2024-05-07T21:58:45Z)
Scalable Language Model with Generalized Continual Learning [58.700439919096155]
The Joint Adaptive Re-ization (JARe) is integrated with Dynamic Task-related Knowledge Retrieval (DTKR) to enable adaptive adjustment of language models based on specific downstream tasks. Our method demonstrates state-of-the-art performance on diverse backbones and benchmarks, achieving effective continual learning in both full-set and few-shot scenarios with minimal forgetting.
arXiv Detail & Related papers (2024-04-11T04:22:15Z)
Analyzing and Adapting Large Language Models for Few-Shot Multilingual NLU: Are We There Yet? [82.02076369811402]
Supervised fine-tuning (SFT), supervised instruction tuning (SIT) and in-context learning (ICL) are three alternative, de facto standard approaches to few-shot learning. We present an extensive and systematic comparison of the three approaches, testing them on 6 high- and low-resource languages, three different NLU tasks, and a myriad of language and domain setups. Our observations show that supervised instruction tuning has the best trade-off between performance and resource requirements.
arXiv Detail & Related papers (2024-03-04T10:48:13Z)
FAC$^2$E: Better Understanding Large Language Model Capabilities by Dissociating Language and Cognition [56.76951887823882]
Large language models (LLMs) are primarily evaluated by overall performance on various text understanding and generation tasks. We present FAC$2$E, a framework for Fine-grAined and Cognition-grounded LLMs' Capability Evaluation.
arXiv Detail & Related papers (2024-02-29T21:05:37Z)
CLIMB: Curriculum Learning for Infant-inspired Model Building [6.4766496232839685]
We describe our team's contribution to the STRICT-SMALL track of the BabyLM Challenge. The challenge requires training a language model from scratch using only a relatively small training dataset of ten million words. We experiment with three variants of cognitively-motivated curriculum learning and analyze their effect on the performance of the model.
arXiv Detail & Related papers (2023-11-15T11:48:16Z)
Evaluating Neural Language Models as Cognitive Models of Language Acquisition [4.779196219827507]
We argue that some of the most prominent benchmarks for evaluating the syntactic capacities of neural language models may not be sufficiently rigorous. When trained on small-scale data modeling child language acquisition, the LMs can be readily matched by simple baseline models. We conclude with suggestions for better connecting LMs with the empirical study of child language acquisition.
arXiv Detail & Related papers (2023-10-31T00:16:17Z)
Large Language Models can Contrastively Refine their Generation for Better Sentence Representation Learning [57.74233319453229]
Large language models (LLMs) have emerged as a groundbreaking technology and their unparalleled text generation capabilities have sparked interest in their application to the fundamental sentence representation learning task. We propose MultiCSR, a multi-level contrastive sentence representation learning framework that decomposes the process of prompting LLMs to generate a corpus. Our experiments reveal that MultiCSR enables a less advanced LLM to surpass the performance of ChatGPT, while applying it to ChatGPT achieves better state-of-the-art results.
arXiv Detail & Related papers (2023-10-17T03:21:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.