Increasing The Performance of Cognitively Inspired Data-Efficient
Language Models via Implicit Structure Building
- URL: http://arxiv.org/abs/2310.20589v1
- Date: Tue, 31 Oct 2023 16:26:36 GMT
- Title: Increasing The Performance of Cognitively Inspired Data-Efficient
Language Models via Implicit Structure Building
- Authors: Omar Momen, David Arps, Laura Kallmeyer
- Abstract summary: We train language models that incorporate unsupervised predictions about hierarchical sentence structure into the model architecture.
StructFormer models have been shown to perform well on unsupervised syntactic induction based on limited pretraining data.
Evaluation of our models on 39 tasks provided by the BabyLM challenge shows promising improvements of models that integrate a hierarchical bias into the architecture.
- Score: 6.445605125467575
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we describe our submission to the BabyLM Challenge 2023 shared
task on data-efficient language model (LM) pretraining (Warstadt et al., 2023).
We train transformer-based masked language models that incorporate unsupervised
predictions about hierarchical sentence structure into the model architecture.
Concretely, we use the Structformer architecture (Shen et al., 2021) and
variants thereof. StructFormer models have been shown to perform well on
unsupervised syntactic induction based on limited pretraining data, and to
yield performance improvements over a vanilla transformer architecture (Shen et
al., 2021). Evaluation of our models on 39 tasks provided by the BabyLM
challenge shows promising improvements of models that integrate a hierarchical
bias into the architecture at some particular tasks, even though they fail to
consistently outperform the RoBERTa baseline model provided by the shared task
organizers on all tasks.
Related papers
- StructLM: Towards Building Generalist Models for Structured Knowledge Grounding [49.10029030628653]
Large language models' (LLMs) ability to process structured data lags behind state-of-the-art (SoTA) model by an average of 35%.
We train a series of models, referred to as StructLM, based on the Mistral and the CodeLlama model family, ranging from 7B to 34B parameters.
Our StructLM series surpasses task-specific models on 16 out of 18 evaluated datasets and establishes new SoTA performance on 8 SKG tasks.
arXiv Detail & Related papers (2024-02-26T15:47:01Z) - LLM Augmented LLMs: Expanding Capabilities through Composition [56.40953749310957]
CALM -- Composition to Augment Language Models -- introduces cross-attention between models to compose their representations and enable new capabilities.
We illustrate that augmenting PaLM2-S with a smaller model trained on low-resource languages results in an absolute improvement of up to 13% on tasks like translation into English.
When PaLM2-S is augmented with a code-specific model, we see a relative improvement of 40% over the base model for code generation and explanation tasks.
arXiv Detail & Related papers (2024-01-04T18:53:01Z) - The LLM Surgeon [33.90611088414982]
We explore data-driven compression of existing pretrained models as an alternative to training smaller models from scratch.
We provide a general framework for unstructured, semi-structured and structured pruning and improve upon weight updates to capture more correlations between weights.
Our method can prune rows and columns from a range of OPT models and Llamav2-7B by 20%-30%, with a negligible loss in performance.
arXiv Detail & Related papers (2023-12-28T18:59:09Z) - Autoregressive Structured Prediction with Language Models [73.11519625765301]
We describe an approach to model structures as sequences of actions in an autoregressive manner with PLMs.
Our approach achieves the new state-of-the-art on all the structured prediction tasks we looked at.
arXiv Detail & Related papers (2022-10-26T13:27:26Z) - Compositional Generalisation with Structured Reordering and Fertility
Layers [121.37328648951993]
Seq2seq models have been shown to struggle with compositional generalisation.
We present a flexible end-to-end differentiable neural model that composes two structural operations.
arXiv Detail & Related papers (2022-10-06T19:51:31Z) - DeepStruct: Pretraining of Language Models for Structure Prediction [64.84144849119554]
We pretrain language models on a collection of task-agnostic corpora to generate structures from text.
Our structure pretraining enables zero-shot transfer of the learned knowledge that models have about the structure tasks.
We show that a 10B parameter language model transfers non-trivially to most tasks and obtains state-of-the-art performance on 21 of 28 datasets.
arXiv Detail & Related papers (2022-05-21T00:58:22Z) - GroupBERT: Enhanced Transformer Architecture with Efficient Grouped
Structures [57.46093180685175]
We demonstrate a set of modifications to the structure of a Transformer layer, producing a more efficient architecture.
We add a convolutional module to complement the self-attention module, decoupling the learning of local and global interactions.
We apply the resulting architecture to language representation learning and demonstrate its superior performance compared to BERT models of different scales.
arXiv Detail & Related papers (2021-06-10T15:41:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.