Flow of Reasoning:Training LLMs for Divergent Problem Solving with Minimal Examples
- URL: http://arxiv.org/abs/2406.05673v3
- Date: Fri, 04 Oct 2024 15:14:55 GMT
- Title: Flow of Reasoning:Training LLMs for Divergent Problem Solving with Minimal Examples
- Authors: Fangxu Yu, Lai Jiang, Haoqiang Kang, Shibo Hao, Lianhui Qin,
- Abstract summary: Flow of Reasoning aims to improve reasoning quality and diversity with minimal data.
FoR formulates multi-step LLM reasoning as a Markovian flow on a DAG-structured reasoning graph.
Experiments show that, with limited training examples, FoR enables the discovery of diverse, creative, high-quality solutions.
- Score: 12.48027669682156
- License:
- Abstract: The ability to generate diverse solutions to a given problem is a hallmark of human creativity. This divergent reasoning is also crucial for machines, enhancing their robustness and enabling them to assist humans in many applications such as scientific discovery. However, existing approaches to multi-step reasoning with large language models (LLMs) have mostly focused only on reasoning accuracy, without further discovering more diverse valid solutions. For example, supervised fine-tuning can improve LLM reasoning quality, but requires extensive supervised data to capture the full range of possible solutions. Reinforcement learning aims to find limited highest-reward solutions while neglecting the solution diversity. To fill this gap, we propose Flow of Reasoning (FoR), an efficient diversity-seeking LLM finetuning method aimed at improving reasoning quality and diversity with minimal data. FoR formulates multi-step LLM reasoning as a Markovian flow on a DAG-structured reasoning graph. This formulation allows us to incorporate and adapt principled GFlowNet approaches, for finetuning LLMs to sample diverse reasoning paths with probabilities proportional to the (unnormalized) reward of target problems. Extensive experiments show that, with limited training examples (e.g., 15 examples), FoR enables the discovery of diverse, creative, high-quality solutions, greatly outperforming a wide range of existing inference and training methods across five challenging puzzle-solving tasks, including BlocksWorld (embodied reasoning), Game24 (math puzzle solving), Rubik's Cube (spatial reasoning), 1D-ARC (abstraction reasoning), and PrOntoQA (logical reasoning). Code is available at https://github.com/Yu-Fangxu/FoR.
Related papers
- GFlowNet Fine-tuning for Diverse Correct Solutions in Mathematical Reasoning Tasks [0.10713888959520208]
We train large language models (LLMs) using generative flow network (GFlowNet)
GFlowNet fine-tuning seeks to find diverse solutions by training the LLM whose distribution is proportional to a reward function.
Results show that GFlowNet fine-tuning derives correct final answers from diverse intermediate reasoning steps.
arXiv Detail & Related papers (2024-10-26T11:13:33Z) - BloomWise: Enhancing Problem-Solving capabilities of Large Language Models using Bloom's-Taxonomy-Inspired Prompts [59.83547898874152]
We introduce BloomWise, a new prompting technique, inspired by Bloom's taxonomy, to improve the performance of Large Language Models (LLMs)
The decision regarding the need to employ more sophisticated cognitive skills is based on self-evaluation performed by the LLM.
In extensive experiments across 4 popular math reasoning datasets, we have demonstrated the effectiveness of our proposed approach.
arXiv Detail & Related papers (2024-10-05T09:27:52Z) - Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification [52.095460362197336]
Large language models (LLMs) struggle with consistent and accurate reasoning.
LLMs are trained primarily on correct solutions, reducing their ability to detect and learn from errors.
We propose a novel collaborative method integrating Chain-of-Thought (CoT) and Program-of-Thought (PoT) solutions for verification.
arXiv Detail & Related papers (2024-10-05T05:21:48Z) - Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning [53.6472920229013]
Large Language Models (LLMs) have demonstrated impressive capability in many natural language tasks.
LLMs are prone to produce errors, hallucinations and inconsistent statements when performing multi-step reasoning.
We introduce Q*, a framework for guiding LLMs decoding process with deliberative planning.
arXiv Detail & Related papers (2024-06-20T13:08:09Z) - Can LLMs Reason in the Wild with Programs? [20.47557047823847]
We introduce the task of reasoning in the wild, where an LLM is tasked to solve a reasoning problem of unknown type.
We create a large tactic-guided trajectory dataset containing detailed solutions to a diverse set of reasoning problems.
In experiments, we highlight that existing LLMs fail significantly on problems with ambiguous and mixed scope.
arXiv Detail & Related papers (2024-06-19T18:26:19Z) - Large Language Model-Aided Evolutionary Search for Constrained Multiobjective Optimization [15.476478159958416]
We employ a large language model (LLM) to enhance evolutionary search for solving constrained multi-objective optimization problems.
Our aim is to speed up the convergence of the evolutionary population.
arXiv Detail & Related papers (2024-05-09T13:44:04Z) - Distilling Algorithmic Reasoning from LLMs via Explaining Solution Programs [2.3020018305241337]
Distilling explicit chain-of-thought reasoning paths has emerged as an effective method for improving the reasoning abilities of large language models.
We propose a novel approach to distill reasoning abilities from LLMs by leveraging their capacity to explain solutions.
Our experiments demonstrate that learning from explanations enables the Reasoner to more effectively guide program implementation by a Coder.
arXiv Detail & Related papers (2024-04-11T22:19:50Z) - DiLA: Enhancing LLM Tool Learning with Differential Logic Layer [11.810200077863172]
We propose a novel differential logic layer-aided language modeling (DiLA) approach, where logical constraints are integrated into the forward and backward passes of a network layer.
We evaluate the performance of DiLA on two classic reasoning problems and empirically demonstrate its consistent outperformance against existing prompt-based and solver-aided approaches.
arXiv Detail & Related papers (2024-02-19T07:38:57Z) - Thought Propagation: An Analogical Approach to Complex Reasoning with Large Language Models [62.96551299003463]
We propose textbftextitThought Propagation (TP) to enhance the complex reasoning ability of Large Language Models.
TP first prompts LLMs to propose and solve a set of analogous problems that are related to the input one.
TP reuses the results of analogous problems to directly yield a new solution or derive a knowledge-intensive plan for execution to amend the initial solution obtained from scratch.
arXiv Detail & Related papers (2023-10-06T01:40:09Z) - Discovering Diverse Solutions in Deep Reinforcement Learning [84.45686627019408]
Reinforcement learning algorithms are typically limited to learning a single solution of a specified task.
We propose an RL method that can learn infinitely many solutions by training a policy conditioned on a continuous or discrete low-dimensional latent variable.
arXiv Detail & Related papers (2021-03-12T04:54:31Z) - Learning What to Defer for Maximum Independent Sets [84.00112106334655]
We propose a novel DRL scheme, coined learning what to defer (LwD), where the agent adaptively shrinks or stretch the number of stages by learning to distribute the element-wise decisions of the solution at each stage.
We apply the proposed framework to the maximum independent set (MIS) problem, and demonstrate its significant improvement over the current state-of-the-art DRL scheme.
arXiv Detail & Related papers (2020-06-17T02:19:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.