Baby's CoThought: Leveraging Large Language Models for Enhanced
Reasoning in Compact Models
- URL: http://arxiv.org/abs/2308.01684v2
- Date: Mon, 23 Oct 2023 12:05:52 GMT
- Title: Baby's CoThought: Leveraging Large Language Models for Enhanced
Reasoning in Compact Models
- Authors: Zheyu Zhang, Han Yang, Bolei Ma, David R\"ugamer, Ercong Nie
- Abstract summary: We propose a "CoThought" pipeline, which efficiently trains smaller "baby" language models (BabyLMs)
Our pipeline restructures a dataset of less than 100M in size using GPT-3.5-turbo, transforming it into task-oriented, human-readable texts.
Our BabyLM outperforms the vanilla RoBERTa in 10 linguistic, NLU, and question-answering tasks by more than 3 points.
- Score: 3.1244568065126863
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) demonstrate remarkable performance on a variety
of natural language understanding (NLU) tasks, primarily due to their
in-context learning ability. This ability could be applied to building babylike
models, i.e. models at small scales, improving training efficiency. In this
paper, we propose a "CoThought" pipeline, which efficiently trains smaller
"baby" language models (BabyLMs) by leveraging the Chain of Thought prompting
of LLMs. Our pipeline restructures a dataset of less than 100M in size using
GPT-3.5-turbo, transforming it into task-oriented, human-readable texts that
are comparable to the school texts for language learners. The BabyLM is then
pretrained on this restructured dataset in a RoBERTa fashion. In evaluations
across 4 benchmarks, our BabyLM outperforms the vanilla RoBERTa in 10
linguistic, NLU, and question-answering tasks by more than 3 points, showing a
superior ability to extract contextual information. These results suggest that
compact LMs pretrained on small, LLM-restructured data can better understand
tasks and achieve improved performance.
Related papers
- TriSum: Learning Summarization Ability from Large Language Models with Structured Rationale [66.01943465390548]
We introduce TriSum, a framework for distilling large language models' text summarization abilities into a compact, local model.
Our method enhances local model performance on various benchmarks.
It also improves interpretability by providing insights into the summarization rationale.
arXiv Detail & Related papers (2024-03-15T14:36:38Z) - Learning to Reduce: Optimal Representations of Structured Data in
Prompting Large Language Models [42.16047343029512]
Large Language Models (LLMs) have been widely used as general-purpose AI agents.
We propose a framework, Learning to Reduce, that fine-tunes a language model to generate a reduced version of an input context.
We show that our model achieves comparable accuracies in selecting the relevant evidence from an input context.
arXiv Detail & Related papers (2024-02-22T00:41:23Z) - Pre-training LLMs using human-like development data corpus [3.5757761767474876]
We pre-train and evaluate Large Language Models (LLMs) on their ability to learn contextual word representations using roughly the same number of tokens as seen by children.
We provide a strong set of baselines; with different architectures, evaluation of changes in performance across epochs, and reported pre-training metrics for the strict small and strict tracks of the task.
arXiv Detail & Related papers (2023-11-08T13:13:23Z) - Mini Minds: Exploring Bebeshka and Zlata Baby Models [3.558894829990311]
We describe the University of Lyon 2 submission to the Strict-Small track of the BabyLM competition.
We introduce two small-size language models (LMs) that were submitted for evaluation.
Despite being half the scale of the baseline LMs, our proposed models achieve comparable performance.
arXiv Detail & Related papers (2023-11-06T16:01:10Z) - Too Much Information: Keeping Training Simple for BabyLMs [2.900810893770134]
This paper details the work of the University of Groningen for the BabyLM Challenge.
We follow the idea that, like babies, language models should be introduced to simpler concepts first and build off of that knowledge to understand more complex concepts.
We examine this strategy of simple-then-complex through a variety of lenses, namely context size, vocabulary, and overall linguistic complexity of the data.
arXiv Detail & Related papers (2023-11-03T14:50:00Z) - CoAnnotating: Uncertainty-Guided Work Allocation between Human and Large
Language Models for Data Annotation [94.59630161324013]
We propose CoAnnotating, a novel paradigm for Human-LLM co-annotation of unstructured texts at scale.
Our empirical study shows CoAnnotating to be an effective means to allocate work from results on different datasets, with up to 21% performance improvement over random baseline.
arXiv Detail & Related papers (2023-10-24T08:56:49Z) - BAMBOO: A Comprehensive Benchmark for Evaluating Long Text Modeling Capacities of Large Language Models [141.21603469555225]
Large language models (LLMs) have achieved dramatic proficiency over NLP tasks with normal length.
We propose BAMBOO, a multi-task long context benchmark.
It consists of 10 datasets from 5 different long text understanding tasks.
arXiv Detail & Related papers (2023-09-23T11:36:15Z) - LLM-Pruner: On the Structural Pruning of Large Language Models [65.02607075556742]
Large language models (LLMs) have shown remarkable capabilities in language understanding and generation.
We tackle the compression of LLMs within the bound of two constraints: being task-agnostic and minimizing the reliance on the original training dataset.
Our method, named LLM-Pruner, adopts structural pruning that selectively removes non-critical coupled structures.
arXiv Detail & Related papers (2023-05-19T12:10:53Z) - Pre-Training to Learn in Context [138.0745138788142]
The ability of in-context learning is not fully exploited because language models are not explicitly trained to learn in context.
We propose PICL (Pre-training for In-Context Learning), a framework to enhance the language models' in-context learning ability.
Our experiments show that PICL is more effective and task-generalizable than a range of baselines, outperforming larger language models with nearly 4x parameters.
arXiv Detail & Related papers (2023-05-16T03:38:06Z) - PaLM: Scaling Language Modeling with Pathways [180.69584031908113]
We trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model PaLM.
We trained PaLM on 6144 TPU v4 chips using Pathways, a new ML system which enables highly efficient training across multiple TPU Pods.
We demonstrate continued benefits of scaling by achieving state-of-the-art few-shot learning results on hundreds of language understanding and generation benchmarks.
arXiv Detail & Related papers (2022-04-05T16:11:45Z) - NLP From Scratch Without Large-Scale Pretraining: A Simple and Efficient
Framework [10.656788279434798]
We propose a simple and efficient learning framework, TLM, that does not rely on large-scale pretraining.
On eight classification datasets in four domains, TLM achieves results better than or similar to pretrained language models.
arXiv Detail & Related papers (2021-11-07T17:13:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.