ToddlerBERTa: Exploiting BabyBERTa for Grammar Learning and Language
Understanding
- URL: http://arxiv.org/abs/2308.16336v2
- Date: Wed, 8 Nov 2023 12:31:35 GMT
- Title: ToddlerBERTa: Exploiting BabyBERTa for Grammar Learning and Language
Understanding
- Authors: Omer Veysel Cagatan
- Abstract summary: We present ToddlerBERTa, a BabyBERTa-like language model, exploring its capabilities through five different models.
We find that smaller models can excel in specific tasks, while larger models perform well with substantial data.
ToddlerBERTa demonstrates commendable performance, rivalling the state-of-the-art RoBERTa-base.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present ToddlerBERTa, a BabyBERTa-like language model, exploring its
capabilities through five different models with varied hyperparameters.
Evaluating on BLiMP, SuperGLUE, MSGS, and a Supplement benchmark from the
BabyLM challenge, we find that smaller models can excel in specific tasks,
while larger models perform well with substantial data. Despite training on a
smaller dataset, ToddlerBERTa demonstrates commendable performance, rivalling
the state-of-the-art RoBERTa-base. The model showcases robust language
understanding, even with single-sentence pretraining, and competes with
baselines that leverage broader contextual information. Our work provides
insights into hyperparameter choices, and data utilization, contributing to the
advancement of language models.
Related papers
- Mini Minds: Exploring Bebeshka and Zlata Baby Models [3.558894829990311]
We describe the University of Lyon 2 submission to the Strict-Small track of the BabyLM competition.
We introduce two small-size language models (LMs) that were submitted for evaluation.
Despite being half the scale of the baseline LMs, our proposed models achieve comparable performance.
arXiv Detail & Related papers (2023-11-06T16:01:10Z) - Evaluating Large Language Models on Controlled Generation Tasks [92.64781370921486]
We present an extensive analysis of various benchmarks including a sentence planning benchmark with different granularities.
After comparing large language models against state-of-the-start finetuned smaller models, we present a spectrum showing large language models falling behind, are comparable, or exceed the ability of smaller models.
arXiv Detail & Related papers (2023-10-23T03:48:24Z) - Baby's CoThought: Leveraging Large Language Models for Enhanced
Reasoning in Compact Models [3.1244568065126863]
We propose a "CoThought" pipeline, which efficiently trains smaller "baby" language models (BabyLMs)
Our pipeline restructures a dataset of less than 100M in size using GPT-3.5-turbo, transforming it into task-oriented, human-readable texts.
Our BabyLM outperforms the vanilla RoBERTa in 10 linguistic, NLU, and question-answering tasks by more than 3 points.
arXiv Detail & Related papers (2023-08-03T10:52:52Z) - Language Model Pre-Training with Sparse Latent Typing [66.75786739499604]
We propose a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types.
Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge.
arXiv Detail & Related papers (2022-10-23T00:37:08Z) - PaLM: Scaling Language Modeling with Pathways [180.69584031908113]
We trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model PaLM.
We trained PaLM on 6144 TPU v4 chips using Pathways, a new ML system which enables highly efficient training across multiple TPU Pods.
We demonstrate continued benefits of scaling by achieving state-of-the-art few-shot learning results on hundreds of language understanding and generation benchmarks.
arXiv Detail & Related papers (2022-04-05T16:11:45Z) - conSultantBERT: Fine-tuned Siamese Sentence-BERT for Matching Jobs and
Job Seekers [2.208694022993555]
We explain our task where noisy data from parsed resumes, heterogeneous nature of the different sources of data, and crosslinguality and multilinguality present domain-specific challenges.
We address these challenges by fine-tuning a Siamese Sentence Siamese-BERT (SBERT) model, which we call conSultantBERT, using a large-scale, real-world, and high quality dataset of over 270,000 resume-vacancy pairs labeled by our staffing consultants.
We show how our fine-tuned model significantly outperforms unsupervised and supervised baselines that rely on TF-IDF-weighted feature vectors and BERT embeddings
arXiv Detail & Related papers (2021-09-14T07:57:05Z) - Comparison of Interactive Knowledge Base Spelling Correction Models for
Low-Resource Languages [81.90356787324481]
Spelling normalization for low resource languages is a challenging task because the patterns are hard to predict.
This work shows a comparison of a neural model and character language models with varying amounts on target language data.
Our usage scenario is interactive correction with nearly zero amounts of training examples, improving models as more data is collected.
arXiv Detail & Related papers (2020-10-20T17:31:07Z) - InfoBERT: Improving Robustness of Language Models from An Information
Theoretic Perspective [84.78604733927887]
Large-scale language models such as BERT have achieved state-of-the-art performance across a wide range of NLP tasks.
Recent studies show that such BERT-based models are vulnerable facing the threats of textual adversarial attacks.
We propose InfoBERT, a novel learning framework for robust fine-tuning of pre-trained language models.
arXiv Detail & Related papers (2020-10-05T20:49:26Z) - TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data [113.29476656550342]
We present TaBERT, a pretrained LM that jointly learns representations for NL sentences and tables.
TaBERT is trained on a large corpus of 26 million tables and their English contexts.
Implementation of the model will be available at http://fburl.com/TaBERT.
arXiv Detail & Related papers (2020-05-17T17:26:40Z) - lamBERT: Language and Action Learning Using Multimodal BERT [0.1942428068361014]
This study proposes the language and action learning using multimodal BERT (lamBERT) model.
Experiment is conducted in a grid environment that requires language understanding for the agent to act properly.
The lamBERT model obtained higher rewards in multitask settings and transfer settings when compared to other models.
arXiv Detail & Related papers (2020-04-15T13:54:55Z) - RobBERT: a Dutch RoBERTa-based Language Model [9.797319790710711]
We use RoBERTa to train a Dutch language model called RobBERT.
We measure its performance on various tasks as well as the importance of the fine-tuning dataset size.
RobBERT improves state-of-the-art results for various tasks, and especially significantly outperforms other models when dealing with smaller datasets.
arXiv Detail & Related papers (2020-01-17T13:25:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.