Related papers: GFlowNet Fine-tuning for Diverse Correct Solutions in Mathematical Reasoning Tasks

GFlowNet Fine-tuning for Diverse Correct Solutions in Mathematical Reasoning Tasks

URL: http://arxiv.org/abs/2410.20147v1
Date: Sat, 26 Oct 2024 11:13:33 GMT
Title: GFlowNet Fine-tuning for Diverse Correct Solutions in Mathematical Reasoning Tasks
Authors: Ryoichi Takase, Masaya Tsunokake, Yuta Tsuchiya, Shota Inuzuka,
Abstract summary: We train large language models (LLMs) using generative flow network (GFlowNet) GFlowNet fine-tuning seeks to find diverse solutions by training the LLM whose distribution is proportional to a reward function. Results show that GFlowNet fine-tuning derives correct final answers from diverse intermediate reasoning steps.
Score: 0.10713888959520208
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Mathematical reasoning problems are among the most challenging, as they typically require an understanding of fundamental laws to solve. The laws are universal, but the derivation of the final answer changes depending on how a problem is approached. When training large language models (LLMs), learning the capability of generating such multiple solutions is essential to accelerate their use in mathematical education. To this end, we train LLMs using generative flow network (GFlowNet). Different from reward-maximizing reinforcement learning (RL), GFlowNet fine-tuning seeks to find diverse solutions by training the LLM whose distribution is proportional to a reward function. In numerical experiments, we evaluate GFlowNet fine-tuning and reward-maximizing RL in terms of accuracy and diversity. The results show that GFlowNet fine-tuning derives correct final answers from diverse intermediate reasoning steps, indicating the improvement of the capability of alternative solution generation.

Related papers

Exploring Solution Divergence and Its Effect on Large Language Model Problem Solving [37.94354699202412]
We show that higher solution divergence is positively related to better problem-solving abilities across various models.<n>We propose solution divergence as a novel metric that can support both SFT and RL strategies.
arXiv Detail & Related papers (2025-09-26T15:27:50Z)
The Majority is not always right: RL training for solution aggregation [53.1050856072799]
We train an aggregator model to review, reconcile, and synthesize a final, correct answer.<n>A key ingredient is careful balancing of easy and hard training examples.<n>We find our method, AggLM, outperforms both strong rule-based and reward-model baselines.
arXiv Detail & Related papers (2025-09-08T16:39:38Z)
RL for Reasoning by Adaptively Revealing Rationales [36.50924054394857]
Supervised fine-tuning (SFT) relies on dense ground-truth labels, which become increasingly costly as sequence length grows.<n>We address this by adaptive backtracking (AdaBack), a per-sample curriculum learning algorithm that reveals only a partial prefix of the target output during training.<n>We show that our adaptive curriculum over partial answers reliably solves problems that are otherwise intractable.
arXiv Detail & Related papers (2025-06-22T17:46:14Z)
GFlowVLM: Enhancing Multi-step Reasoning in Vision-Language Models with Generative Flow Networks [4.851402232145819]
We introduce GFlowVLM, a framework that fine-tune Vision-Language Models (VLMs) using Generative Flow Networks (GFlowNets) GFlowVLM models the environment as a non-Markovian decision process, allowing it to capture long-term dependencies essential for real-world applications. Empirical results demonstrate the effectiveness of GFlowVLM on complex tasks such as card games (NumberLine, BlackJack) and embodied planning tasks (ALFWorld)
arXiv Detail & Related papers (2025-03-09T08:38:10Z)
Flow of Reasoning:Training LLMs for Divergent Problem Solving with Minimal Examples [12.48027669682156]
Flow of Reasoning aims to improve reasoning quality and diversity with minimal data. FoR formulates multi-step LLM reasoning as a Markovian flow on a DAG-structured reasoning graph. Experiments show that, with limited training examples, FoR enables the discovery of diverse, creative, high-quality solutions.
arXiv Detail & Related papers (2024-06-09T07:06:58Z)
Let the Flows Tell: Solving Graph Combinatorial Optimization Problems with GFlowNets [86.43523688236077]
Combinatorial optimization (CO) problems are often NP-hard and out of reach for exact algorithms. GFlowNets have emerged as a powerful machinery to efficiently sample from composite unnormalized densities sequentially. In this paper, we design Markov decision processes (MDPs) for different problems and propose to train conditional GFlowNets to sample from the solution space.
arXiv Detail & Related papers (2023-05-26T15:13:09Z)
Towards Understanding and Improving GFlowNet Training [71.85707593318297]
We introduce an efficient evaluation strategy to compare the learned sampling distribution to the target reward distribution. We propose prioritized replay training of high-reward $x$, relative edge flow policy parametrization, and a novel guided trajectory balance objective.
arXiv Detail & Related papers (2023-05-11T22:50:41Z)
Learning to Optimize Permutation Flow Shop Scheduling via Graph-based Imitation Learning [70.65666982566655]
Permutation flow shop scheduling (PFSS) is widely used in manufacturing systems. We propose to train the model via expert-driven imitation learning, which accelerates convergence more stably and accurately. Our model's network parameters are reduced to only 37% of theirs, and the solution gap of our model towards the expert solutions decreases from 6.8% to 1.3% on average.
arXiv Detail & Related papers (2022-10-31T09:46:26Z)
Learning GFlowNets from partial episodes for improved convergence and stability [56.99229746004125]
Generative flow networks (GFlowNets) are algorithms for training a sequential sampler of discrete objects under an unnormalized target density. Existing training objectives for GFlowNets are either local to states or transitions, or propagate a reward signal over an entire sampling trajectory. Inspired by the TD($lambda$) algorithm in reinforcement learning, we introduce subtrajectory balance or SubTB($lambda$), a GFlowNet training objective that can learn from partial action subsequences of varying lengths.
arXiv Detail & Related papers (2022-09-26T15:44:24Z)
AMS-Net: Adaptive Multiscale Sparse Neural Network with Interpretable Basis Expansion for Multiphase Flow Problems [8.991619150027267]
We propose an adaptive sparse learning algorithm that can be applied to learn the physical processes and obtain a sparse representation of the solution given a large snapshot space. The information of the basis functions are incorporated in the loss function, which minimizes the differences between the downscaled reduced order solutions and reference solutions at multiple time steps. More numerical tests are performed on two-phase multiscale flow problems to show the capability and interpretability of the proposed method on complicated applications.
arXiv Detail & Related papers (2022-07-24T13:12:43Z)
Flow Network based Generative Models for Non-Iterative Diverse Candidate Generation [110.09855163856326]
This paper is about the problem of learning a policy for generating an object from a sequence of actions. We propose GFlowNet, based on a view of the generative process as a flow network. We prove that any global minimum of the proposed objectives yields a policy which samples from the desired distribution.
arXiv Detail & Related papers (2021-06-08T14:21:10Z)
Reversible Action Design for Combinatorial Optimization with Reinforcement Learning [35.50454156611722]
Reinforcement learning (RL) has recently emerged as a new framework to tackle these problems. We propose a general RL framework that not only exhibits state-of-the-art empirical performance but also generalizes to a variety class of COPs.
arXiv Detail & Related papers (2021-02-14T18:05:42Z)
Learning by Fixing: Solving Math Word Problems with Weak Supervision [70.62896781438694]
Previous neural solvers of math word problems (MWPs) are learned with full supervision and fail to generate diverse solutions. We introduce a textitweakly-supervised paradigm for learning MWPs. Our method only requires the annotations of the final answers and can generate various solutions for a single problem.
arXiv Detail & Related papers (2020-12-19T03:10:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.