Related papers: The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning

The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning

URL: http://arxiv.org/abs/2305.14045v2
Date: Sat, 14 Oct 2023 10:46:55 GMT
Title: The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning
Authors: Seungone Kim, Se June Joo, Doyoung Kim, Joel Jang, Seonghyeon Ye, Jamin Shin, Minjoon Seo
Abstract summary: Language models (LMs) with less than 100B parameters are known to perform poorly on chain-of-thought (CoT) reasoning. In this work, we aim to equip smaller LMs with the step-by-step reasoning capability by instruction tuning with CoT rationales.
Score: 50.75534397373867
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Language models (LMs) with less than 100B parameters are known to perform poorly on chain-of-thought (CoT) reasoning in contrast to large LMs when solving unseen tasks. In this work, we aim to equip smaller LMs with the step-by-step reasoning capability by instruction tuning with CoT rationales. In order to achieve this goal, we first introduce a new instruction-tuning dataset called the CoT Collection, which augments the existing Flan Collection (including only 9 CoT tasks) with additional 1.84 million rationales across 1,060 tasks. We show that CoT fine-tuning Flan-T5 (3B & 11B) with CoT Collection enables smaller LMs to have better CoT capabilities on unseen tasks. On the BIG-Bench-Hard (BBH) benchmark, we report an average improvement of +4.34% (Flan-T5 3B) and +2.60% (Flan-T5 11B), in terms of zero-shot task accuracy. Furthermore, we show that instruction tuning with CoT Collection allows LMs to possess stronger few-shot learning capabilities on 4 domain-specific tasks, resulting in an improvement of +2.24% (Flan-T5 3B) and +2.37% (Flan-T5 11B), even outperforming ChatGPT utilizing demonstrations until the max length by a +13.98% margin. Our code, the CoT Collection data, and model checkpoints are publicly available.

Related papers

LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters! [53.84130385074551]
Large reasoning models (LRMs) tackle complex reasoning problems by following long chain-of-thoughts (Long CoT) We find that a Large Language model (LLM) can effectively learn Long CoT reasoning through data-efficient supervised fine-tuning (SFT) and parameter-efficient low-rank adaptation (LoRA) With just 17k long CoT training samples, the Qwen2.5-32B-Instruct model achieves significant improvements on a wide range of math and coding benchmarks.
arXiv Detail & Related papers (2025-02-11T08:48:48Z)
AS-ES Learning: Towards Efficient CoT Learning in Small Models [35.225382243612174]
Chain-of-Thought (CoT) serves as a critical emerging ability in Large Language Models (LLMs) We propose a new training paradigm AS-ES (Abstractive Segments - Extractive Segments) learning, which exploits the inherent information in CoT for iterative generation. Experiments show that our methods surpass the direct seq2seq training on CoT-extensive tasks like MWP and PET summarization, without data augmentation or altering the model itself.
arXiv Detail & Related papers (2024-03-04T12:13:59Z)
How Do Humans Write Code? Large Models Do It the Same Way Too [14.954886191356342]
Program-of-Thought (PoT) replaces natural language-based Chain-of-Thought (CoT) as the most popular method in Large Language Models. Using PoT introduces more reasoning errors, such as incorrect formulas or flawed logic, compared to CoT. We propose Human-Think Language (HTL), which leverages a suite of strategies that help integrate PoT and CoT.
arXiv Detail & Related papers (2024-02-24T05:40:01Z)
TAT-LLM: A Specialized Language Model for Discrete Reasoning over Tabular and Textual Data [73.29220562541204]
We consider harnessing the amazing power of language models (LLMs) to solve our task. We develop a TAT-LLM language model by fine-tuning LLaMA 2 with the training data generated automatically from existing expert-annotated datasets.
arXiv Detail & Related papers (2024-01-24T04:28:50Z)
Fewer is More: Boosting LLM Reasoning with Reinforced Context Pruning [31.110005898556892]
Large Language Models (LLMs) have shown impressive capabilities, yet they still struggle with math reasoning. We propose CoT-Influx, a novel approach that pushes the boundary of few-shot Chain-of-Thoughts (CoT) learning. CoT-Influx employs a coarse-to-fine pruner to maximize the input of effective and concise CoT examples.
arXiv Detail & Related papers (2023-12-14T13:03:13Z)
The Flan Collection: Designing Data and Methods for Effective Instruction Tuning [118.70716915295091]
We study the design decisions of publicly available instruction tuning methods, and break down the development of Flan 2022. We find task balancing and enrichment techniques are overlooked but critical to effective instruction tuning. To accelerate research on instruction tuning, we make the Flan 2022 collection of datasets, templates, and methods publicly available.
arXiv Detail & Related papers (2023-01-31T15:03:44Z)
Faithful Chain-of-Thought Reasoning [51.21714389639417]
Chain-of-Thought (CoT) prompting boosts Language Models' (LM) performance on a gamut of reasoning tasks. We propose Faithful CoT, a reasoning framework involving two stages: Translation and Problem Solving. This guarantees that the reasoning chain provides a faithful explanation of the final answer.
arXiv Detail & Related papers (2023-01-31T03:04:26Z)
Scaling Instruction-Finetuned Language Models [126.4789306516927]
Finetuning language models on a collection of datasets phrased as instructions has been shown to improve model performance. We find that instruction finetuning dramatically improves performance on a variety of model classes.
arXiv Detail & Related papers (2022-10-20T16:58:32Z)
Finetuned Language Models Are Zero-Shot Learners [67.70352207685558]
We show that instruction tuning boosts zero-shot performance on unseen tasks. We take a 137B parameter pretrained language model and instruction-tune it on over 60 NLP tasks verbalized via natural language instruction templates. We evaluate this instruction-tuned model, which we call FLAN, on unseen task types.
arXiv Detail & Related papers (2021-09-03T17:55:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.