Learning Composable Chains-of-Thought
- URL: http://arxiv.org/abs/2505.22635v1
- Date: Wed, 28 May 2025 17:51:10 GMT
- Title: Learning Composable Chains-of-Thought
- Authors: Fangcong Yin, Zeyu Leo Liu, Liu Leqi, Xi Ye, Greg Durrett,
- Abstract summary: We train large language models (LLMs) to reason on chain-of-thought (CoT) traces of in-distribution reasoning problems.<n>We take a step towards compositional generalization of reasoning skills when addressing a target compositional task that has no labeled CoT data.<n>We can train "atomic CoT" models on the atomic tasks with Composable CoT data and combine them with multitask learning or model merging for better zero-shot performance on the target compositional task.
- Score: 57.73731224510169
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A common approach for teaching large language models (LLMs) to reason is to train on chain-of-thought (CoT) traces of in-distribution reasoning problems, but such annotated data is costly to obtain for every problem of interest. We want reasoning models to generalize beyond their training distribution, and ideally to generalize compositionally: combine atomic reasoning skills to solve harder, unseen reasoning tasks. We take a step towards compositional generalization of reasoning skills when addressing a target compositional task that has no labeled CoT data. We find that simply training models on CoT data of atomic tasks leads to limited generalization, but minimally modifying CoT formats of constituent atomic tasks to be composable can lead to improvements. We can train "atomic CoT" models on the atomic tasks with Composable CoT data and combine them with multitask learning or model merging for better zero-shot performance on the target compositional task. Such a combined model can be further bootstrapped on a small amount of compositional data using rejection sampling fine-tuning (RFT). Results on string operations and natural language skill compositions show that training LLMs on Composable CoT outperforms multitask learning and continued fine-tuning baselines within a given training data budget.
Related papers
- SPaRFT: Self-Paced Reinforcement Fine-Tuning for Large Language Models [51.74498855100541]
Large language models (LLMs) have shown strong reasoning capabilities when fine-tuned with reinforcement learning (RL)<n>We propose textbfSPaRFT, a self-paced learning framework that enables efficient learning based on the capability of the model being trained.
arXiv Detail & Related papers (2025-08-07T03:50:48Z) - StableMTL: Repurposing Latent Diffusion Models for Multi-Task Learning from Partially Annotated Synthetic Datasets [14.867396697566257]
We extend the partial learning setup to a zero-shot setting, training a multi-task model on multiple datasets, each labeled for only a subset of tasks.<n>Our method, StableMTL, repurposes image generators for latent regression.<n>Instead of per-task losses requiring careful balancing, a unified latent loss is adopted, enabling seamless scaling to more tasks.
arXiv Detail & Related papers (2025-06-09T17:59:59Z) - Unveiling the Key Factors for Distilling Chain-of-Thought Reasoning [33.02060729778806]
This study examines the factors influencing Chain-of-Thought (CoT) distillation in Small Language Models (SLMs)<n>We find that SLMs exhibit a non-monotonic relationship with granularity, with stronger models benefiting from finer-grained reasoning and weaker models performing better with simpler CoT supervision.<n>These findings emphasize the need to tailor CoT strategies to specific student model, offering actionable insights for optimizing CoT distillation in SLMs.
arXiv Detail & Related papers (2025-02-25T09:08:45Z) - TACO: Learning Multi-modal Action Models with Synthetic Chains-of-Thought-and-Action [103.5952731807559]
We present TACO, a family of multi-modal large action models designed to improve performance on complex, multi-step, and multi-modal tasks.<n>During inference, TACO produces chains-of-thought-and-action (CoTA), executes intermediate steps by invoking external tools such as OCR, depth estimation and calculator.<n>This dataset enables TACO to learn complex reasoning and action paths, surpassing existing models trained on instruction tuning data with only direct answers.
arXiv Detail & Related papers (2024-12-07T00:42:04Z) - Understanding Chain-of-Thought in LLMs through Information Theory [16.78730663293352]
We formalize Chain-of-Thought (CoT) reasoning in Large Language Models (LLMs) through an information-theoretic lens.
Specifically, our framework quantifies the information gain' at each reasoning step, enabling the identification of failure modes.
We demonstrate the efficacy of our approach through extensive experiments on toy and GSM-8K data, where it significantly outperforms existing outcome-based methods.
arXiv Detail & Related papers (2024-11-18T19:14:36Z) - AS-ES Learning: Towards Efficient CoT Learning in Small Models [35.225382243612174]
Chain-of-Thought (CoT) serves as a critical emerging ability in Large Language Models (LLMs)
We propose a new training paradigm AS-ES (Abstractive Segments - Extractive Segments) learning, which exploits the inherent information in CoT for iterative generation.
Experiments show that our methods surpass the direct seq2seq training on CoT-extensive tasks like MWP and PET summarization, without data augmentation or altering the model itself.
arXiv Detail & Related papers (2024-03-04T12:13:59Z) - A Unified Causal View of Instruction Tuning [76.1000380429553]
We develop a meta Structural Causal Model (meta-SCM) to integrate different NLP tasks under a single causal structure of the data.
Key idea is to learn task-required causal factors and only use those to make predictions for a given task.
arXiv Detail & Related papers (2024-02-09T07:12:56Z) - Data-CUBE: Data Curriculum for Instruction-based Sentence Representation
Learning [85.66907881270785]
We propose a data curriculum method, namely Data-CUBE, that arranges the orders of all the multi-task data for training.
In the task level, we aim to find the optimal task order to minimize the total cross-task interference risk.
In the instance level, we measure the difficulty of all instances per task, then divide them into the easy-to-difficult mini-batches for training.
arXiv Detail & Related papers (2024-01-07T18:12:20Z) - Learning to Perform Complex Tasks through Compositional Fine-Tuning of
Language Models [20.173322408302134]
compositional fine-tuning is an approach based on explicitly decomposing a target task into component tasks.
We show that CFT outperforms end-to-end learning even with equal amounts of data.
arXiv Detail & Related papers (2022-10-23T03:22:34Z) - Task Compass: Scaling Multi-task Pre-training with Task Prefix [122.49242976184617]
Existing studies show that multi-task learning with large-scale supervised tasks suffers from negative effects across tasks.
We propose a task prefix guided multi-task pre-training framework to explore the relationships among tasks.
Our model can not only serve as the strong foundation backbone for a wide range of tasks but also be feasible as a probing tool for analyzing task relationships.
arXiv Detail & Related papers (2022-10-12T15:02:04Z) - KnowDA: All-in-One Knowledge Mixture Model for Data Augmentation in
Few-Shot NLP [68.43279384561352]
Existing data augmentation algorithms leverage task-independent rules or fine-tune general-purpose pre-trained language models.
These methods have trivial task-specific knowledge and are limited to yielding low-quality synthetic data for weak baselines in simple tasks.
We propose the Knowledge Mixture Data Augmentation Model (KnowDA): an encoder-decoder LM pretrained on a mixture of diverse NLP tasks.
arXiv Detail & Related papers (2022-06-21T11:34:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.