Enhancing Auto-regressive Chain-of-Thought through Loop-Aligned Reasoning
- URL: http://arxiv.org/abs/2502.08482v1
- Date: Wed, 12 Feb 2025 15:17:04 GMT
- Title: Enhancing Auto-regressive Chain-of-Thought through Loop-Aligned Reasoning
- Authors: Qifan Yu, Zhenyu He, Sijie Li, Xun Zhou, Jun Zhang, Jingjing Xu, Di He,
- Abstract summary: Chain-of-Thought (CoT) prompting has emerged as a powerful technique for enhancing language model's reasoning capabilities.
Looped Transformers possess remarkable length generalization capabilities, but their limited generality and adaptability prevent them from serving as an alternative to auto-regressive solutions.
We propose RELAY to better leverage the strengths of Looped Transformers.
- Score: 47.06427150903487
- License:
- Abstract: Chain-of-Thought (CoT) prompting has emerged as a powerful technique for enhancing language model's reasoning capabilities. However, generating long and correct CoT trajectories is challenging. Recent studies have demonstrated that Looped Transformers possess remarkable length generalization capabilities, but their limited generality and adaptability prevent them from serving as an alternative to auto-regressive solutions. To better leverage the strengths of Looped Transformers, we propose RELAY (REasoning through Loop Alignment iterativelY). Specifically, we align the steps of Chain-of-Thought (CoT) reasoning with loop iterations and apply intermediate supervision during the training of Looped Transformers. This additional iteration-wise supervision not only preserves the Looped Transformer's ability for length generalization but also enables it to predict CoT reasoning steps for unseen data. Therefore, we leverage this Looped Transformer to generate accurate reasoning chains for complex problems that exceed the training length, which will then be used to fine-tune an auto-regressive model. We conduct extensive experiments, and the results demonstrate the effectiveness of our approach, with significant improvements in the performance of the auto-regressive model. Code will be released at https://github.com/qifanyu/RELAY.
Related papers
- Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis [82.51626700527837]
Chain-of-shift (CoT) is an efficient method that enables the reasoning ability of large language models by augmenting the query using examples with multiple intermediate steps.
We show that despite the theoretical success of CoT, it fails to provide an accurate generalization when CoT does.
arXiv Detail & Related papers (2024-10-03T03:12:51Z) - Investigating Recurrent Transformers with Dynamic Halt [64.862738244735]
We study the inductive biases of two major approaches to augmenting Transformers with a recurrent mechanism.
We propose and investigate novel ways to extend and combine the methods.
arXiv Detail & Related papers (2024-02-01T19:47:31Z) - Emergent Agentic Transformer from Chain of Hindsight Experience [96.56164427726203]
We show that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
This is the first time that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
arXiv Detail & Related papers (2023-05-26T00:43:02Z) - Towards Incremental Transformers: An Empirical Analysis of Transformer Models for Incremental NLU [19.103130032967663]
Incremental processing allows interactive systems to respond based on partial inputs.
Recent work attempts to apply Transformers incrementally via restart-incrementality.
This approach is computationally costly and does not scale efficiently for long sequences.
arXiv Detail & Related papers (2021-09-15T15:20:29Z) - Finetuning Pretrained Transformers into RNNs [81.72974646901136]
Transformers have outperformed recurrent neural networks (RNNs) in natural language generation.
A linear-complexity recurrent variant has proven well suited for autoregressive generation.
This work aims to convert a pretrained transformer into its efficient recurrent counterpart.
arXiv Detail & Related papers (2021-03-24T10:50:43Z) - Transformers are RNNs: Fast Autoregressive Transformers with Linear
Attention [22.228028613802174]
Transformers achieve remarkable performance in several tasks but due to their quadratic complexity, they are prohibitively slow for very long sequences.
We make use of the associativity property of matrix products to reduce the complexity from $mathcalOleft(N2right)$ to $mathcalOleft(Nright)$, where $N$ is the sequence length.
Our linear transformers achieve similar performance to vanilla transformers and they are up to 4000x faster on autoregressive prediction of very long sequences.
arXiv Detail & Related papers (2020-06-29T17:55:38Z) - Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech
Recognition [66.47000813920617]
We propose a spike-triggered non-autoregressive transformer model for end-to-end speech recognition.
The proposed model can accurately predict the length of the target sequence and achieve a competitive performance.
The model even achieves a real-time factor of 0.0056, which exceeds all mainstream speech recognition models.
arXiv Detail & Related papers (2020-05-16T08:27:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.