Related papers: A surprisal oracle for when every layer counts

A surprisal oracle for when every layer counts

URL: http://arxiv.org/abs/2412.03098v1
Date: Wed, 04 Dec 2024 07:53:45 GMT
Title: A surprisal oracle for when every layer counts
Authors: Xudong Hong, Sharid Loáiciga, Asad Sayeed,
Abstract summary: Active Curriculum Language Modeling (ACLM) is a learner directed approach to training a language model.<n>We propose an updated ACLM process for the BabyLM 2024 task.
Score: 2.5716627278119444
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Active Curriculum Language Modeling (ACLM; Hong et al., 2023) is a learner directed approach to training a language model. We proposed the original version of this process in our submission to the BabyLM 2023 task, and now we propose an updated ACLM process for the BabyLM 2024 task. ACLM involves an iteratively- and dynamically-constructed curriculum informed over the training process by a model of uncertainty; other training items that are similarly uncertain to a least certain candidate item are prioritized. Our new process improves the similarity model so that it is more dynamic, and we run ACLM over the most successful model from the BabyLM 2023 task: ELC-BERT (Charpentier and Samuel, 2023). We find that while our models underperform on fine-grained grammatical inferences, they outperform the BabyLM 2024 official base-lines on common-sense and world-knowledge tasks. We make our code available at https: //github.com/asayeed/ActiveBaby.

Related papers

BabyLM Turns 4 and Goes Multilingual: Call for Papers for the 2026 BabyLM Workshop [73.0356575273869]
The goal of the BabyLM is to stimulate new research connections between cognitive modeling and language model pretraining.<n>This year, we move beyond our previous English-only pretraining datasets with a new track, focusing on English, Dutch, and Chinese.<n>For the workshop, we call for papers related to the overall theme of BabyLM, which includes training efficiency, small-scale training datasets, cognitive modeling, model evaluation, and architecture innovation.
arXiv Detail & Related papers (2026-02-23T18:02:23Z)
Capability Instruction Tuning: A New Paradigm for Dynamic LLM Routing [64.38277118982698]
Large Language Models (LLMs) have demonstrated human-like instruction-following abilities. In this work, we explore how to route the best-performing LLM for each instruction to achieve better overall performance. We develop a new paradigm, constructing capability instructions with model capability representation, user instruction, and performance inquiry prompts to assess the performance.
arXiv Detail & Related papers (2025-02-24T16:10:53Z)
Instruction Pre-Training: Language Models are Supervised Multitask Learners [115.95022434390181]
In this paper, we propose a framework that augments massive raw corpora with instruction-response pairs to pre-train language models (LMs)<n>In our experiments, we synthesize 200M instruction-response pairs covering 40+ task categories to verify the effectiveness of Instruction Pre-Training.
arXiv Detail & Related papers (2024-06-20T16:55:33Z)
Increasing The Performance of Cognitively Inspired Data-Efficient Language Models via Implicit Structure Building [6.445605125467575]
We train language models that incorporate unsupervised predictions about hierarchical sentence structure into the model architecture. StructFormer models have been shown to perform well on unsupervised syntactic induction based on limited pretraining data. Evaluation of our models on 39 tasks provided by the BabyLM challenge shows promising improvements of models that integrate a hierarchical bias into the architecture.
arXiv Detail & Related papers (2023-10-31T16:26:36Z)
Large Language Models as Generalizable Policies for Embodied Tasks [50.870491905776305]
We show that large language models (LLMs) can be adapted to be generalizable policies for embodied visual tasks. Our approach, called Large LAnguage model Reinforcement Learning Policy (LLaRP), adapts a pre-trained frozen LLM to take as input text instructions and visual egocentric observations and output actions directly in the environment.
arXiv Detail & Related papers (2023-10-26T18:32:05Z)
Auto-Instruct: Automatic Instruction Generation and Ranking for Black-Box Language Models [91.02730155418699]
Large language models (LLMs) can perform a wide range of tasks by following natural language instructions. We introduce Auto-Instruct, a novel method to automatically improve the quality of instructions provided to LLMs. In experiments on 118 out-of-domain tasks, Auto-Instruct surpasses both human-written instructions and existing baselines of LLM-generated instructions.
arXiv Detail & Related papers (2023-10-19T19:52:55Z)
Improving Language Plasticity via Pretraining with Active Forgetting [63.36484652568976]
We propose to use an active forgetting mechanism during pretraining, as a simple way of creating PLMs that can quickly adapt to new languages. Experiments with RoBERTa show that models pretrained with our forgetting mechanism demonstrate faster convergence during language adaptation.
arXiv Detail & Related papers (2023-07-03T17:12:44Z)
Guiding Large Language Models via Directional Stimulus Prompting [114.84930073977672]
We introduce Directional Stimulus Prompting, a novel framework for guiding black-box large language models (LLMs) toward specific desired outputs. Instead of directly adjusting LLMs, our method employs a small tunable policy model to generate an auxiliary directional stimulus prompt for each input instance.
arXiv Detail & Related papers (2023-02-22T17:44:15Z)
ZhichunRoad at Amazon KDD Cup 2022: MultiTask Pre-Training for E-Commerce Product Search [4.220439000486713]
We propose a robust multilingual model to improve the quality of search results. In pre-training stage, we adopt mlm task, classification task and contrastive learning task. In fine-tuning stage, we use confident learning, exponential moving average method (EMA), adversarial training (FGM) and regularized dropout strategy (R-Drop)
arXiv Detail & Related papers (2023-01-31T07:31:34Z)
BLCU-ICALL at SemEval-2022 Task 1: Cross-Attention Multitasking Framework for Definition Modeling [16.794041736487323]
This paper describes the BLCU-ICALL system used in the SemEval-2022 Task 1 Comparing Dictionaries and Word Embeddings. We propose a transformer-based multitasking framework to explore the task.
arXiv Detail & Related papers (2022-04-16T02:33:28Z)
MetaICL: Learning to Learn In Context [87.23056864536613]
We introduce MetaICL, a new meta-training framework for few-shot learning where a pretrained language model is tuned to do in-context learn-ing on a large set of training tasks. We show that MetaICL approaches (and sometimes beats) the performance of models fully finetuned on the target task training data, and outperforms much bigger models with nearly 8x parameters.
arXiv Detail & Related papers (2021-10-29T17:42:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.