BAMBINO-LM: (Bilingual-)Human-Inspired Continual Pretraining of BabyLM
- URL: http://arxiv.org/abs/2406.11418v2
- Date: Tue, 9 Jul 2024 06:49:38 GMT
- Title: BAMBINO-LM: (Bilingual-)Human-Inspired Continual Pretraining of BabyLM
- Authors: Zhewen Shen, Aditya Joshi, Ruey-Cheng Chen,
- Abstract summary: We introduce BAMBINO-LM, a continual pre-training strategy for small-scale language models.
We show that BAMBINO-LM improves the Italian language capability of a BabyLM baseline.
We also show that, as a side effect, the proposed method leads to a similar degradation in L1 effectiveness as human children would have had in an equivalent learning scenario.
- Score: 3.329407751651262
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Children from bilingual backgrounds benefit from interactions with parents and teachers to re-acquire their heritage language. In this paper, we investigate how this insight from behavioral study can be incorporated into the learning of small-scale language models. We introduce BAMBINO-LM, a continual pre-training strategy for BabyLM that uses a novel combination of alternation and PPO-based perplexity reward induced from a parent Italian model. Upon evaluation on zero-shot classification tasks for English and Italian, BAMBINO-LM improves the Italian language capability of a BabyLM baseline. Our ablation analysis demonstrates that employing both the alternation strategy and PPO-based modeling is key to this effectiveness gain. We also show that, as a side effect, the proposed method leads to a similar degradation in L1 effectiveness as human children would have had in an equivalent learning scenario. Through its modeling and findings, BAMBINO-LM makes a focused contribution to the pre-training of small-scale language models by first developing a human-inspired strategy for pre-training and then showing that it results in behaviours similar to that of humans.
Related papers
- BAMBI: Developing Baby Language Models for Italian [45.36413940519089]
This paper presents BAMBI (BAby language Models Boostrapped for Italian), a series of Baby Language Models (BabyLMs) trained on data that mimic the linguistic input received by a five-year-old Italian-speaking child.
The BAMBI models are tested using a benchmark specifically designed to evaluate language models, which takes into account the amount of training input the models received.
arXiv Detail & Related papers (2025-03-12T15:36:50Z) - Benchmarking LLMs for Mimicking Child-Caregiver Language in Interaction [4.109949110722246]
LLMs can generate human-like dialogues, yet their ability to simulate early child-adult interactions remains largely unexplored.
We found that state-of-the-art LLMs can approximate child- caregiver dialogues at the word and utterance level, but they struggle to reproduce the child and caregiver's discursive patterns, exaggerate alignment, and fail to reach the level of diversity shown by humans.
arXiv Detail & Related papers (2024-12-12T14:43:03Z) - The Rise and Down of Babel Tower: Investigating the Evolution Process of Multilingual Code Large Language Model [59.357993924917]
We study the evolution of multilingual capabilities in large language models (LLMs) during the pre-training process.
We propose the Babel Tower Hypothesis, which describes the entire process of LLMs acquiring new language capabilities.
We propose a novel method to construct an optimized pre-training corpus for multilingual code LLMs.
arXiv Detail & Related papers (2024-12-10T08:28:57Z) - Improving Bilingual Capabilities of Language Models to Support Diverse Linguistic Practices in Education [3.799331337558008]
Large language models (LLMs) offer promise in generating educational content, providing instructor feedback, and reducing teacher workload on assessments.
We study the effectiveness of multilingual large language models (MLLMs) across monolingual (English-only, Spanish-only) and bilingual (Spanglish) student writing.
arXiv Detail & Related papers (2024-11-06T23:16:25Z) - Less is More: Pre-Training Cross-Lingual Small-Scale Language Models with Cognitively-Plausible Curriculum Learning Strategies [2.6684726101845]
We assess whether linguistic acquisition theories can be used to specify more fine-grained curriculum learning strategies.
We create age-ordered corpora of Child-Directed Speech for four typologically distant language families to implement SSLMs and acquisition-inspired curricula cross-lingually.
arXiv Detail & Related papers (2024-10-30T10:31:54Z) - Exploring Natural Language-Based Strategies for Efficient Number Learning in Children through Reinforcement Learning [0.0]
This paper investigates how children learn numbers using the framework of reinforcement learning (RL)
By using state of the art deep reinforcement learning models, we simulate and analyze the effects of various forms of language instructions on number acquisition.
arXiv Detail & Related papers (2024-10-10T19:49:13Z) - MoE-CT: A Novel Approach For Large Language Models Training With Resistance To Catastrophic Forgetting [53.77590764277568]
We introduce a novel MoE-CT architecture that separates the base model's learning from the multilingual expansion process.
Our design freezes the original LLM parameters, thus safeguarding its performance in high-resource languages, while an appended MoE module, trained on diverse language datasets, augments low-resource language proficiency.
arXiv Detail & Related papers (2024-06-25T11:03:45Z) - CLIMB: Curriculum Learning for Infant-inspired Model Building [6.4766496232839685]
We describe our team's contribution to the STRICT-SMALL track of the BabyLM Challenge.
The challenge requires training a language model from scratch using only a relatively small training dataset of ten million words.
We experiment with three variants of cognitively-motivated curriculum learning and analyze their effect on the performance of the model.
arXiv Detail & Related papers (2023-11-15T11:48:16Z) - Computational Language Acquisition with Theory of Mind [84.2267302901888]
We build language-learning agents equipped with Theory of Mind (ToM) and measure its effects on the learning process.
We find that training speakers with a highly weighted ToM listener component leads to performance gains in our image referential game setting.
arXiv Detail & Related papers (2023-03-02T18:59:46Z) - LERT: A Linguistically-motivated Pre-trained Language Model [67.65651497173998]
We propose LERT, a pre-trained language model that is trained on three types of linguistic features along with the original pre-training task.
We carried out extensive experiments on ten Chinese NLU tasks, and the experimental results show that LERT could bring significant improvements.
arXiv Detail & Related papers (2022-11-10T05:09:16Z) - Towards Lifelong Learning of Multilingual Text-To-Speech Synthesis [87.75833205560406]
This work presents a lifelong learning approach to train a multilingual Text-To-Speech (TTS) system.
It does not require pooled data from all languages altogether, and thus alleviates the storage and computation burden.
arXiv Detail & Related papers (2021-10-09T07:00:38Z) - Word Acquisition in Neural Language Models [0.38073142980733]
We investigate how neural language models acquire individual words during training, extracting learning curves and ages of acquisition for over 600 words.
We find that the effects of concreteness, word length, and lexical class are pointedly different in children and language models.
arXiv Detail & Related papers (2021-10-05T23:26:16Z) - Mixed-Lingual Pre-training for Cross-lingual Summarization [54.4823498438831]
Cross-lingual Summarization aims at producing a summary in the target language for an article in the source language.
We propose a solution based on mixed-lingual pre-training that leverages both cross-lingual tasks like translation and monolingual tasks like masked language models.
Our model achieves an improvement of 2.82 (English to Chinese) and 1.15 (Chinese to English) ROUGE-1 scores over state-of-the-art results.
arXiv Detail & Related papers (2020-10-18T00:21:53Z) - Dynamic Data Selection and Weighting for Iterative Back-Translation [116.14378571769045]
We propose a curriculum learning strategy for iterative back-translation models.
We evaluate our models on domain adaptation, low-resource, and high-resource MT settings.
Experimental results demonstrate that our methods achieve improvements of up to 1.8 BLEU points over competitive baselines.
arXiv Detail & Related papers (2020-04-07T19:49:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.