Related papers: CoT-ICL Lab: A Petri Dish for Studying Chain-of-Thought Learning from In-Context Demonstrations

CoT-ICL Lab: A Petri Dish for Studying Chain-of-Thought Learning from In-Context Demonstrations

URL: http://arxiv.org/abs/2502.15132v2
Date: Thu, 27 Feb 2025 07:52:22 GMT
Title: CoT-ICL Lab: A Petri Dish for Studying Chain-of-Thought Learning from In-Context Demonstrations
Authors: Vignesh Kothapalli, Hamed Firooz, Maziar Sanjabi,
Abstract summary: CoT-ICL Lab is a framework and methodology to generate synthetic tokenized datasets.<n>We systematically study chain-of-thought (CoT) in-context learning (ICL) in language models.
Score: 11.907286102852957
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce CoT-ICL Lab, a framework and methodology to generate synthetic tokenized datasets and systematically study chain-of-thought (CoT) in-context learning (ICL) in language models. CoT-ICL Lab allows fine grained control over the complexity of in-context examples by decoupling (1) the causal structure involved in chain token generation from (2) the underlying token processing functions. We train decoder-only transformers (up to 700M parameters) on these datasets and show that CoT accelerates the accuracy transition to higher values across model sizes. In particular, we find that model depth is crucial for leveraging CoT with limited in-context examples, while more examples help shallow models match deeper model performance. Additionally, limiting the diversity of token processing functions throughout training improves causal structure learning via ICL. We also interpret these transitions by analyzing transformer embeddings and attention maps. Overall, CoT-ICL Lab serves as a simple yet powerful testbed for theoretical and empirical insights into ICL and CoT in language models.

Related papers

Understanding Knowledge Hijack Mechanism in In-context Learning through Associative Memory [37.93644115914534]
In-context learning (ICL) enables large language models to adapt to new tasks without fine-tuning.<n>This paper investigates the balance between in-context information and pretrained bigram knowledge in token prediction.
arXiv Detail & Related papers (2024-12-16T05:33:05Z)
Re-examining learning linear functions in context [1.8843687952462742]
In-context learning (ICL) has emerged as a powerful paradigm for easily adapting Large Language Models (LLMs) to various tasks.<n>We explore a simple model of ICL in a controlled setup with synthetic training data.<n>Our findings challenge the prevailing narrative that transformers adopt algorithmic approaches to learn a linear function in-context.
arXiv Detail & Related papers (2024-11-18T10:58:46Z)
From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency [17.612497960364916]
Chain-of-thought (CoT) significantly enhances the reasoning performance of large language models (LLM) We demonstrate that CoT can substantially improve sample efficiency even when representation power is sufficient. We show that CoT simplifies the learning process by introducing sparse dependencies among input tokens, and leads to a sparse and interpretable attention.
arXiv Detail & Related papers (2024-10-07T19:45:09Z)
Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis [82.51626700527837]
Chain-of-shift (CoT) is an efficient method that enables the reasoning ability of large language models by augmenting the query using examples with multiple intermediate steps. We show that despite the theoretical success of CoT, it fails to provide an accurate generalization when CoT does.
arXiv Detail & Related papers (2024-10-03T03:12:51Z)
ChainLM: Empowering Large Language Models with Improved Chain-of-Thought Prompting [124.69672273754144]
Chain-of-Thought (CoT) prompting can enhance the reasoning capabilities of large language models (LLMs) Existing CoT approaches usually focus on simpler reasoning tasks and thus result in low-quality and inconsistent CoT prompts. We introduce CoTGenius, a novel framework designed for the automatic generation of superior CoT prompts.
arXiv Detail & Related papers (2024-03-21T11:34:26Z)
AS-ES Learning: Towards Efficient CoT Learning in Small Models [35.225382243612174]
Chain-of-Thought (CoT) serves as a critical emerging ability in Large Language Models (LLMs) We propose a new training paradigm AS-ES (Abstractive Segments - Extractive Segments) learning, which exploits the inherent information in CoT for iterative generation. Experiments show that our methods surpass the direct seq2seq training on CoT-extensive tasks like MWP and PET summarization, without data augmentation or altering the model itself.
arXiv Detail & Related papers (2024-03-04T12:13:59Z)
How Do Nonlinear Transformers Learn and Generalize in In-Context Learning? [82.51626700527837]
Transformer-based large language models displayed impressive in-context learning capabilities, where a pre-trained model can handle new tasks without fine-tuning. We analyze how the mechanics of how Transformer to achieve ICL contribute to the technical challenges of the training problems in Transformers.
arXiv Detail & Related papers (2024-02-23T21:07:20Z)
In-Context Language Learning: Architectures and Algorithms [73.93205821154605]
We study ICL through the lens of a new family of model problems we term in context language learning (ICLL) We evaluate a diverse set of neural sequence models on regular ICLL tasks.
arXiv Detail & Related papers (2024-01-23T18:59:21Z)
How Do Transformers Learn In-Context Beyond Simple Functions? A Case Study on Learning with Representations [98.7450564309923]
This paper takes initial steps on understanding in-context learning (ICL) in more complex scenarios, by studying learning with representations. We construct synthetic in-context learning problems with a compositional structure, where the label depends on the input through a possibly complex but fixed representation function. We show theoretically the existence of transformers that approximately implement such algorithms with mild depth and size.
arXiv Detail & Related papers (2023-10-16T17:40:49Z)
Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection [88.23337313766353]
This work first provides a comprehensive statistical theory for transformers to perform ICL. We show that transformers can implement a broad class of standard machine learning algorithms in context. A emphsingle transformer can adaptively select different base ICL algorithms.
arXiv Detail & Related papers (2023-06-07T17:59:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.