Baby's CoThought: Leveraging Large Language Models for Enhanced
Reasoning in Compact Models
- URL: http://arxiv.org/abs/2308.01684v2
- Date: Mon, 23 Oct 2023 12:05:52 GMT
- Title: Baby's CoThought: Leveraging Large Language Models for Enhanced
Reasoning in Compact Models
- Authors: Zheyu Zhang, Han Yang, Bolei Ma, David R\"ugamer, Ercong Nie
- Abstract summary: We propose a "CoThought" pipeline, which efficiently trains smaller "baby" language models (BabyLMs)
Our pipeline restructures a dataset of less than 100M in size using GPT-3.5-turbo, transforming it into task-oriented, human-readable texts.
Our BabyLM outperforms the vanilla RoBERTa in 10 linguistic, NLU, and question-answering tasks by more than 3 points.
- Score: 3.1244568065126863
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) demonstrate remarkable performance on a variety
of natural language understanding (NLU) tasks, primarily due to their
in-context learning ability. This ability could be applied to building babylike
models, i.e. models at small scales, improving training efficiency. In this
paper, we propose a "CoThought" pipeline, which efficiently trains smaller
"baby" language models (BabyLMs) by leveraging the Chain of Thought prompting
of LLMs. Our pipeline restructures a dataset of less than 100M in size using
GPT-3.5-turbo, transforming it into task-oriented, human-readable texts that
are comparable to the school texts for language learners. The BabyLM is then
pretrained on this restructured dataset in a RoBERTa fashion. In evaluations
across 4 benchmarks, our BabyLM outperforms the vanilla RoBERTa in 10
linguistic, NLU, and question-answering tasks by more than 3 points, showing a
superior ability to extract contextual information. These results suggest that
compact LMs pretrained on small, LLM-restructured data can better understand
tasks and achieve improved performance.
Related papers
- TituLLMs: A Family of Bangla LLMs with Comprehensive Benchmarking [6.070192392563392]
We present TituLLMs, the first large pretrained Bangla LLMs in 1B and 3B parameter sizes.
To train TituLLMs, we collected a pretraining dataset of approximately 37 billion tokens.
We extended the Llama-3.2 tokenizer to incorporate language- and culture-specific knowledge.
arXiv Detail & Related papers (2025-02-16T16:22:23Z) - BabyLMs for isiXhosa: Data-Efficient Language Modelling in a Low-Resource Context [2.57490464660469]
BabyLM challenge called on participants to develop sample-efficient language models.
submissions were pretrained on a fixed English corpus, limited to the amount of words children are exposed to in development.
New architectures for data-efficient language modelling outperformed models trained on trillions of words.
arXiv Detail & Related papers (2025-01-07T15:13:45Z) - TinyHelen's First Curriculum: Training and Evaluating Tiny Language Models in a Simpler Language Environment [30.93798042712827]
Training language models (LMs) and their application agents is increasingly costly due to large datasets and models.
We propose a pipeline to refine text data by eliminating noise, minimizing vocabulary, and maintaining genre-specific patterns.
Our experiments show that leaner pre-training boosts LM learning efficiency.
arXiv Detail & Related papers (2024-12-31T16:08:15Z) - TriSum: Learning Summarization Ability from Large Language Models with Structured Rationale [66.01943465390548]
We introduce TriSum, a framework for distilling large language models' text summarization abilities into a compact, local model.
Our method enhances local model performance on various benchmarks.
It also improves interpretability by providing insights into the summarization rationale.
arXiv Detail & Related papers (2024-03-15T14:36:38Z) - Learning to Reduce: Optimal Representations of Structured Data in
Prompting Large Language Models [42.16047343029512]
Large Language Models (LLMs) have been widely used as general-purpose AI agents.
We propose a framework, Learning to Reduce, that fine-tunes a language model to generate a reduced version of an input context.
We show that our model achieves comparable accuracies in selecting the relevant evidence from an input context.
arXiv Detail & Related papers (2024-02-22T00:41:23Z) - Pre-training LLMs using human-like development data corpus [3.5757761767474876]
We pre-train and evaluate Large Language Models (LLMs) on their ability to learn contextual word representations using roughly the same number of tokens as seen by children.
We provide a strong set of baselines; with different architectures, evaluation of changes in performance across epochs, and reported pre-training metrics for the strict small and strict tracks of the task.
arXiv Detail & Related papers (2023-11-08T13:13:23Z) - CoAnnotating: Uncertainty-Guided Work Allocation between Human and Large
Language Models for Data Annotation [94.59630161324013]
We propose CoAnnotating, a novel paradigm for Human-LLM co-annotation of unstructured texts at scale.
Our empirical study shows CoAnnotating to be an effective means to allocate work from results on different datasets, with up to 21% performance improvement over random baseline.
arXiv Detail & Related papers (2023-10-24T08:56:49Z) - BAMBOO: A Comprehensive Benchmark for Evaluating Long Text Modeling Capacities of Large Language Models [141.21603469555225]
Large language models (LLMs) have achieved dramatic proficiency over NLP tasks with normal length.
We propose BAMBOO, a multi-task long context benchmark.
It consists of 10 datasets from 5 different long text understanding tasks.
arXiv Detail & Related papers (2023-09-23T11:36:15Z) - LLM-Pruner: On the Structural Pruning of Large Language Models [65.02607075556742]
Large language models (LLMs) have shown remarkable capabilities in language understanding and generation.
We tackle the compression of LLMs within the bound of two constraints: being task-agnostic and minimizing the reliance on the original training dataset.
Our method, named LLM-Pruner, adopts structural pruning that selectively removes non-critical coupled structures.
arXiv Detail & Related papers (2023-05-19T12:10:53Z) - Pre-Training to Learn in Context [138.0745138788142]
The ability of in-context learning is not fully exploited because language models are not explicitly trained to learn in context.
We propose PICL (Pre-training for In-Context Learning), a framework to enhance the language models' in-context learning ability.
Our experiments show that PICL is more effective and task-generalizable than a range of baselines, outperforming larger language models with nearly 4x parameters.
arXiv Detail & Related papers (2023-05-16T03:38:06Z) - PaLM: Scaling Language Modeling with Pathways [180.69584031908113]
We trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model PaLM.
We trained PaLM on 6144 TPU v4 chips using Pathways, a new ML system which enables highly efficient training across multiple TPU Pods.
We demonstrate continued benefits of scaling by achieving state-of-the-art few-shot learning results on hundreds of language understanding and generation benchmarks.
arXiv Detail & Related papers (2022-04-05T16:11:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.