Mini Minds: Exploring Bebeshka and Zlata Baby Models
- URL: http://arxiv.org/abs/2311.03216v1
- Date: Mon, 6 Nov 2023 16:01:10 GMT
- Title: Mini Minds: Exploring Bebeshka and Zlata Baby Models
- Authors: Irina Proskurina, Guillaume Metzler, Julien Velcin
- Abstract summary: We describe the University of Lyon 2 submission to the Strict-Small track of the BabyLM competition.
We introduce two small-size language models (LMs) that were submitted for evaluation.
Despite being half the scale of the baseline LMs, our proposed models achieve comparable performance.
- Score: 3.558894829990311
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we describe the University of Lyon 2 submission to the
Strict-Small track of the BabyLM competition. The shared task is created with
an emphasis on small-scale language modelling from scratch on limited-size data
and human language acquisition. Dataset released for the Strict-Small track has
10M words, which is comparable to children's vocabulary size. We approach the
task with an architecture search, minimizing masked language modelling loss on
the data of the shared task. Having found an optimal configuration, we
introduce two small-size language models (LMs) that were submitted for
evaluation, a 4-layer encoder with 8 attention heads and a 6-layer decoder
model with 12 heads which we term Bebeshka and Zlata, respectively. Despite
being half the scale of the baseline LMs, our proposed models achieve
comparable performance. We further explore the applicability of small-scale
language models in tasks involving moral judgment, aligning their predictions
with human values. These findings highlight the potential of compact LMs in
addressing practical language understanding tasks.
Related papers
- Is Child-Directed Speech Effective Training Data for Language Models? [34.46268640655943]
We train GPT-2 and RoBERTa models on 29M words of English child-directed speech.
We test whether the global developmental ordering or the local discourse ordering of children's training data supports high performance relative to other datasets.
These findings support the hypothesis that, rather than proceeding from better data, the child's learning algorithm is substantially more data-efficient than current language modeling techniques.
arXiv Detail & Related papers (2024-08-07T08:18:51Z) - YAYI 2: Multilingual Open-Source Large Language Models [53.92832054643197]
We propose YAYI 2, including both base and chat models, with 30 billion parameters.
YAYI 2 is pre-trained from scratch on a multilingual corpus which contains 2.65 trillion tokens filtered by our pre-training data processing pipeline.
The base model is aligned with human values through supervised fine-tuning with millions of instructions and reinforcement learning from human feedback.
arXiv Detail & Related papers (2023-12-22T17:34:47Z) - The Ups and Downs of Large Language Model Inference with Vocabulary Trimming by Language Heuristics [74.99898531299148]
This research examines vocabulary trimming (VT) inspired by restricting embedding entries to the language of interest to bolster time and memory efficiency.
We apply two languages to trim the full vocabulary - Unicode-based script filtering and corpus-based selection - to different language families and sizes.
It is found that VT reduces the memory usage of small models by nearly 50% and has an upper bound of 25% improvement in generation speed.
arXiv Detail & Related papers (2023-11-16T09:35:50Z) - Too Much Information: Keeping Training Simple for BabyLMs [2.900810893770134]
This paper details the work of the University of Groningen for the BabyLM Challenge.
We follow the idea that, like babies, language models should be introduced to simpler concepts first and build off of that knowledge to understand more complex concepts.
We examine this strategy of simple-then-complex through a variety of lenses, namely context size, vocabulary, and overall linguistic complexity of the data.
arXiv Detail & Related papers (2023-11-03T14:50:00Z) - Evaluating Neural Language Models as Cognitive Models of Language
Acquisition [4.779196219827507]
We argue that some of the most prominent benchmarks for evaluating the syntactic capacities of neural language models may not be sufficiently rigorous.
When trained on small-scale data modeling child language acquisition, the LMs can be readily matched by simple baseline models.
We conclude with suggestions for better connecting LMs with the empirical study of child language acquisition.
arXiv Detail & Related papers (2023-10-31T00:16:17Z) - Evaluating Large Language Models on Controlled Generation Tasks [92.64781370921486]
We present an extensive analysis of various benchmarks including a sentence planning benchmark with different granularities.
After comparing large language models against state-of-the-start finetuned smaller models, we present a spectrum showing large language models falling behind, are comparable, or exceed the ability of smaller models.
arXiv Detail & Related papers (2023-10-23T03:48:24Z) - Baby's CoThought: Leveraging Large Language Models for Enhanced
Reasoning in Compact Models [3.1244568065126863]
We propose a "CoThought" pipeline, which efficiently trains smaller "baby" language models (BabyLMs)
Our pipeline restructures a dataset of less than 100M in size using GPT-3.5-turbo, transforming it into task-oriented, human-readable texts.
Our BabyLM outperforms the vanilla RoBERTa in 10 linguistic, NLU, and question-answering tasks by more than 3 points.
arXiv Detail & Related papers (2023-08-03T10:52:52Z) - Call for Papers -- The BabyLM Challenge: Sample-efficient pretraining on
a developmentally plausible corpus [32.51325830633226]
We present the call for papers for the BabyLM Challenge: Sample-efficient pretraining on a developmentally plausible corpus.
This shared task is intended for participants with an interest in small scale language modeling, human language acquisition, low-resource NLP, and cognitive modeling.
arXiv Detail & Related papers (2023-01-27T15:52:50Z) - Language Model Pre-Training with Sparse Latent Typing [66.75786739499604]
We propose a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types.
Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge.
arXiv Detail & Related papers (2022-10-23T00:37:08Z) - Towards Language Modelling in the Speech Domain Using Sub-word
Linguistic Units [56.52704348773307]
We propose a novel LSTM-based generative speech LM based on linguistic units including syllables and phonemes.
With a limited dataset, orders of magnitude smaller than that required by contemporary generative models, our model closely approximates babbling speech.
We show the effect of training with auxiliary text LMs, multitask learning objectives, and auxiliary articulatory features.
arXiv Detail & Related papers (2021-10-31T22:48:30Z) - Comparison of Interactive Knowledge Base Spelling Correction Models for
Low-Resource Languages [81.90356787324481]
Spelling normalization for low resource languages is a challenging task because the patterns are hard to predict.
This work shows a comparison of a neural model and character language models with varying amounts on target language data.
Our usage scenario is interactive correction with nearly zero amounts of training examples, improving models as more data is collected.
arXiv Detail & Related papers (2020-10-20T17:31:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.