DialCoT Meets PPO: Decomposing and Exploring Reasoning Paths in Smaller
Language Models
- URL: http://arxiv.org/abs/2310.05074v3
- Date: Mon, 23 Oct 2023 09:38:01 GMT
- Title: DialCoT Meets PPO: Decomposing and Exploring Reasoning Paths in Smaller
Language Models
- Authors: Chengcheng Han, Xiaowei Du, Che Zhang, Yixin Lian, Xiang Li, Ming Gao,
Baoyuan Wang
- Abstract summary: Chain-of-Thought (CoT) prompting has proven to be effective in enhancing the reasoning capabilities of Large Language Models (LLMs) with at least 100 billion parameters.
We introduce Dialogue-guided Chain-of-Thought (DialCoT) which employs a dialogue format to generate intermediate reasoning steps, guiding the model toward the final answer.
- Score: 18.96271708412086
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Chain-of-Thought (CoT) prompting has proven to be effective in enhancing the
reasoning capabilities of Large Language Models (LLMs) with at least 100
billion parameters. However, it is ineffective or even detrimental when applied
to reasoning tasks in Smaller Language Models (SLMs) with less than 10 billion
parameters. To address this limitation, we introduce Dialogue-guided
Chain-of-Thought (DialCoT) which employs a dialogue format to generate
intermediate reasoning steps, guiding the model toward the final answer.
Additionally, we optimize the model's reasoning path selection using the
Proximal Policy Optimization (PPO) algorithm, further enhancing its reasoning
capabilities. Our method offers several advantages compared to previous
approaches. Firstly, we transform the process of solving complex reasoning
questions by breaking them down into a series of simpler sub-questions,
significantly reducing the task difficulty and making it more suitable for
SLMs. Secondly, we optimize the model's reasoning path selection through the
PPO algorithm. We conduct comprehensive experiments on four arithmetic
reasoning datasets, demonstrating that our method achieves significant
performance improvements compared to state-of-the-art competitors.
Related papers
- Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Models [63.36637269634553]
We present a novel method of further improving performance by requiring models to compare multiple reasoning chains.
We find that instruction tuning on DCoT datasets boosts the performance of even smaller, and therefore more accessible, language models.
arXiv Detail & Related papers (2024-07-03T15:01:18Z) - Combinatorial Reasoning: Selecting Reasons in Generative AI Pipelines via Combinatorial Optimization [2.090904951468026]
Large Language Models (LLMs) have demonstrated impressive capabilities at tasks that require human intelligence.
Yet the reasoning capability of LLMs is a matter of significant debate.
We introduce a framework for what we call Combinatorial Reasoning (CR), a fully-automated prompting method.
arXiv Detail & Related papers (2024-06-19T16:47:44Z) - Step-level Value Preference Optimization for Mathematical Reasoning [6.318873143509028]
We introduce a novel algorithm called Step-level Value Preference Optimization (SVPO)
Our approach employs Monte Carlo Tree Search (MCTS) to automatically annotate step-level preferences for multi-step reasoning.
From the perspective of learning-to-rank, we train an explicit value model to replicate the behavior of the implicit reward model.
arXiv Detail & Related papers (2024-06-16T09:06:17Z) - Distilling Algorithmic Reasoning from LLMs via Explaining Solution Programs [2.3020018305241337]
Distilling explicit chain-of-thought reasoning paths has emerged as an effective method for improving the reasoning abilities of large language models.
We propose a novel approach to distill reasoning abilities from LLMs by leveraging their capacity to explain solutions.
Our experiments demonstrate that learning from explanations enables the Reasoner to more effectively guide program implementation by a Coder.
arXiv Detail & Related papers (2024-04-11T22:19:50Z) - Zero-Shot Chain-of-Thought Reasoning Guided by Evolutionary Algorithms
in Large Language Models [11.967815199202203]
Large Language Models (LLMs) have demonstrated remarkable performance across diverse tasks by applying zero-shot Chain-of-Thought (CoT) prompting.
Existing zero-shot CoT prompting methods that employ identical CoT prompting across all task instances may not be optimal.
We introduce a novel zero-shot prompting method that leverages evolutionary algorithms to generate diverse promptings for LLMs dynamically.
arXiv Detail & Related papers (2024-02-08T03:17:38Z) - Let's reward step by step: Step-Level reward model as the Navigators for
Reasoning [64.27898739929734]
Process-Supervised Reward Model (PRM) furnishes LLMs with step-by-step feedback during the training phase.
We propose a greedy search algorithm that employs the step-level feedback from PRM to optimize the reasoning pathways explored by LLMs.
To explore the versatility of our approach, we develop a novel method to automatically generate step-level reward dataset for coding tasks and observed similar improved performance in the code generation tasks.
arXiv Detail & Related papers (2023-10-16T05:21:50Z) - Guiding Language Model Math Reasoning with Planning Tokens [128.57605860640948]
We introduce planning tokens at the start of each reasoning step, serving as a guide for the model, and add their embeddings to the model parameters.
Our approach requires a negligible increase in trainable parameters (just 0.001%) and can be applied through either full fine-tuning or a more parameter-efficient scheme.
arXiv Detail & Related papers (2023-10-09T13:29:37Z) - Thought Propagation: An Analogical Approach to Complex Reasoning with Large Language Models [62.96551299003463]
We propose textbftextitThought Propagation (TP) to enhance the complex reasoning ability of Large Language Models.
TP first prompts LLMs to propose and solve a set of analogous problems that are related to the input one.
TP reuses the results of analogous problems to directly yield a new solution or derive a knowledge-intensive plan for execution to amend the initial solution obtained from scratch.
arXiv Detail & Related papers (2023-10-06T01:40:09Z) - Sci-CoT: Leveraging Large Language Models for Enhanced Knowledge
Distillation in Small Models for Scientific QA [5.117094291273979]
Large Language Models (LLMs) have shown outstanding performance across wide range of downstream tasks.
We propose Sci-CoT, a two-stage framework that separates the processes of generating rationales and inferring answers.
Our 80-million parameter model is able to exceed the performance of BLOOM-176B in the ARC-Easy dataset under the few shot setting.
arXiv Detail & Related papers (2023-08-09T03:18:07Z) - Large Language Model Programs [74.31873455763275]
In recent years, large pre-trained language models (LLMs) have demonstrated the ability to follow instructions and perform novel tasks from a few examples.
We extend this line of reasoning and present a method which further expands the capabilities of an LLM by embedding it within an algorithm or program.
We obtain a 6.4% improvement over the chain of thought baseline through a more algorithmic approach without any finetuning.
arXiv Detail & Related papers (2023-05-09T11:55:36Z) - Complexity-Based Prompting for Multi-Step Reasoning [72.0057198610614]
We study the task of prompting large-scale language models to perform multi-step reasoning.
A central question is which reasoning examples make the most effective prompts.
We propose complexity-based prompting, a simple and effective example selection scheme for multi-step reasoning.
arXiv Detail & Related papers (2022-10-03T05:33:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.