Related papers: Are BabyLMs Second Language Learners?

Are BabyLMs Second Language Learners?

URL: http://arxiv.org/abs/2410.21254v1
Date: Mon, 28 Oct 2024 17:52:15 GMT
Title: Are BabyLMs Second Language Learners?
Authors: Lukas Edman, Lisa Bylinina, Faeze Ghorbanpour, Alexander Fraser,
Abstract summary: This paper describes a linguistically-motivated approach to the 2024 edition of the BabyLM Challenge. Rather than pursuing a first language learning (L1) paradigm, we approach the challenge from a second language (L2) learning perspective.
Score: 48.85680614529188
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper describes a linguistically-motivated approach to the 2024 edition of the BabyLM Challenge (Warstadt et al. 2023). Rather than pursuing a first language learning (L1) paradigm, we approach the challenge from a second language (L2) learning perspective. In L2 learning, there is a stronger focus on learning explicit linguistic information, such as grammatical notions, definitions of words or different ways of expressing a meaning. This makes L2 learning potentially more efficient and concise. We approximate this using data from Wiktionary, grammar examples either generated by an LLM or sourced from grammar books, and paraphrase data. We find that explicit information about word meaning (in our case, Wiktionary) does not boost model performance, while grammatical information can give a small improvement. The most impactful data ingredient is sentence paraphrases, with our two best models being trained on 1) a mix of paraphrase data and data from the BabyLM pretraining dataset, and 2) exclusively paraphrase data.

Related papers

Towards Data-Efficient Language Models: A Child-Inspired Approach to Language Learning [2.565964707090901]
We use various methods of training language models (LMs) with significantly less data compared to traditional large language models (LLMs) We develop a model trained on a curated dataset consisting of 10 million words, primarily sourced from child-directed transcripts. We reduce the vocabulary size to 32,000 tokens, aligning it with the limited vocabulary of children in the early stages of language acquisition.
arXiv Detail & Related papers (2025-03-06T16:57:26Z)
A Distributional Perspective on Word Learning in Neural Language Models [57.41607944290822]
There are no widely agreed-upon metrics for word learning in language models. We argue that distributional signatures studied in prior work fail to capture key distributional information. We obtain learning trajectories for a selection of small language models we train from scratch.
arXiv Detail & Related papers (2025-02-09T13:15:59Z)
Can LLMs Help Create Grammar?: Automating Grammar Creation for Endangered Languages with In-Context Learning [0.0]
This paper explores how Large Language Models (LLMs) can assist in generating grammatical information for low-resource languages with limited amount of data. Our methodology involves organising the existing linguistic data and prompting to efficiently enable to generate formal XLE grammar. This study highlights the potential of LLMs to enhance language documentation efforts, providing a cost-effective solution for generating linguistic data and contributing to the preservation of endangered languages.
arXiv Detail & Related papers (2024-12-14T20:43:12Z)
Can LLMs Really Learn to Translate a Low-Resource Language from One Grammar Book? [6.905647501099997]
Extremely low-resource (XLR) languages lack substantial corpora for training NLP models. Machine Translation from One Book suggests prompting long-context LLMs with one grammar book enables English-Kalamang translation. We investigate whether the book's grammatical explanations or its parallel examples are most effective for learning XLR translation.
arXiv Detail & Related papers (2024-09-27T21:27:32Z)
The Ups and Downs of Large Language Model Inference with Vocabulary Trimming by Language Heuristics [74.99898531299148]
This research examines vocabulary trimming (VT) inspired by restricting embedding entries to the language of interest to bolster time and memory efficiency. We apply two languages to trim the full vocabulary - Unicode-based script filtering and corpus-based selection - to different language families and sizes. It is found that VT reduces the memory usage of small models by nearly 50% and has an upper bound of 25% improvement in generation speed.
arXiv Detail & Related papers (2023-11-16T09:35:50Z)
Morphosyntactic probing of multilingual BERT models [41.83131308999425]
We introduce an extensive dataset for multilingual probing of morphological information in language models. We find that pre-trained Transformer models (mBERT and XLM-RoBERTa) learn features that attain strong performance across these tasks.
arXiv Detail & Related papers (2023-06-09T19:15:20Z)
SLABERT Talk Pretty One Day: Modeling Second Language Acquisition with BERT [0.0]
Cross-linguistic transfer is the influence of linguistic structure of a speaker's native language on the successful acquisition of a foreign language. We find that NLP literature has not given enough attention to the phenomenon of negative transfer. Our findings call for further research using our novel Transformer-based SLA models.
arXiv Detail & Related papers (2023-05-31T06:22:07Z)
CompoundPiece: Evaluating and Improving Decompounding Performance of Language Models [77.45934004406283]
We systematically study decompounding, the task of splitting compound words into their constituents. We introduce a dataset of 255k compound and non-compound words across 56 diverse languages obtained from Wiktionary. We introduce a novel methodology to train dedicated models for decompounding.
arXiv Detail & Related papers (2023-05-23T16:32:27Z)
Translate to Disambiguate: Zero-shot Multilingual Word Sense Disambiguation with Pretrained Language Models [67.19567060894563]
Pretrained Language Models (PLMs) learn rich cross-lingual knowledge and can be finetuned to perform well on diverse tasks. We present a new study investigating how well PLMs capture cross-lingual word sense with Contextual Word-Level Translation (C-WLT) We find that as the model size increases, PLMs encode more cross-lingual word sense knowledge and better use context to improve WLT performance.
arXiv Detail & Related papers (2023-04-26T19:55:52Z)
Adapters for Enhanced Modeling of Multilingual Knowledge and Text [54.02078328453149]
Language models have been extended to multilingual language models (MLLMs) Knowledge graphs contain facts in an explicit triple format, which require careful curation and are only available in a few high-resource languages. We propose to enhance MLLMs with knowledge from multilingual knowledge graphs (MLKGs) so as to tackle language and knowledge graph tasks across many languages.
arXiv Detail & Related papers (2022-10-24T21:33:42Z)
Always Keep your Target in Mind: Studying Semantics and Improving Performance of Neural Lexical Substitution [124.99894592871385]
We present a large-scale comparative study of lexical substitution methods employing both old and most recent language models. We show that already competitive results achieved by SOTA LMs/MLMs can be further substantially improved if information about the target word is injected properly.
arXiv Detail & Related papers (2022-06-07T16:16:19Z)
Pedagogical Word Recommendation: A novel task and dataset on personalized vocabulary acquisition for L2 learners [4.507860128918788]
We propose and release data for a novel task called Pedagogical Word Recommendation. The main goal of PWR is to predict whether a given learner knows a given word based on other words the learner has already seen. As a feature of this ITS, students can directly indicate words they do not know from the questions they solved to create wordbooks.
arXiv Detail & Related papers (2021-12-27T17:52:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.