SEGO: Sequential Subgoal Optimization for Mathematical Problem-Solving
- URL: http://arxiv.org/abs/2310.12960v1
- Date: Thu, 19 Oct 2023 17:56:40 GMT
- Title: SEGO: Sequential Subgoal Optimization for Mathematical Problem-Solving
- Authors: Xueliang Zhao, Xinting Huang, Wei Bi, Lingpeng Kong
- Abstract summary: Large Language Models (LLMs) have driven substantial progress in artificial intelligence.
We propose a novel framework called textbfSEquential subtextbfGoal textbfOptimization (SEGO) to enhance LLMs' ability to solve mathematical problems.
- Score: 64.38649623473626
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Large Language Models (LLMs) have driven substantial progress in artificial
intelligence in recent years, exhibiting impressive capabilities across a wide
range of tasks, including mathematical problem-solving. Inspired by the success
of subgoal-based methods, we propose a novel framework called
\textbf{SE}quential sub\textbf{G}oal \textbf{O}ptimization (SEGO) to enhance
LLMs' ability to solve mathematical problems. By establishing a connection
between the subgoal breakdown process and the probability of solving problems,
SEGO aims to identify better subgoals with theoretical guarantees. Addressing
the challenge of identifying suitable subgoals in a large solution space, our
framework generates problem-specific subgoals and adjusts them according to
carefully designed criteria. Incorporating these optimized subgoals into the
policy model training leads to significant improvements in problem-solving
performance. We validate SEGO's efficacy through experiments on two benchmarks,
GSM8K and MATH, where our approach outperforms existing methods, highlighting
the potential of SEGO in AI-driven mathematical problem-solving.
Data and code associated with this paper will be available at
https://github.com/zhaoxlpku/SEGO
Related papers
- Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization [50.485788083202124]
Reinforcement Learning (RL) plays a crucial role in aligning large language models with human preferences and improving their ability to perform complex tasks.
We introduce Direct Q-function Optimization (DQO), which formulates the response generation process as a Markov Decision Process (MDP) and utilizes the soft actor-critic (SAC) framework to optimize a Q-function directly parameterized by the language model.
Experimental results on two math problem-solving datasets, GSM8K and MATH, demonstrate that DQO outperforms previous methods, establishing it as a promising offline reinforcement learning approach for aligning language models.
arXiv Detail & Related papers (2024-10-11T23:29:20Z) - LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning [56.273799410256075]
The framework combines Monte Carlo Tree Search (MCTS) with iterative Self-Refine to optimize the reasoning path.
The framework has been tested on general and advanced benchmarks, showing superior performance in terms of search efficiency and problem-solving capability.
arXiv Detail & Related papers (2024-10-03T18:12:29Z) - Machine Learning Insides OptVerse AI Solver: Design Principles and
Applications [74.67495900436728]
We present a comprehensive study on the integration of machine learning (ML) techniques into Huawei Cloud's OptVerse AI solver.
We showcase our methods for generating complex SAT and MILP instances utilizing generative models that mirror multifaceted structures of real-world problem.
We detail the incorporation of state-of-the-art parameter tuning algorithms which markedly elevate solver performance.
arXiv Detail & Related papers (2024-01-11T15:02:15Z) - Optimizing Solution-Samplers for Combinatorial Problems: The Landscape
of Policy-Gradient Methods [52.0617030129699]
We introduce a novel theoretical framework for analyzing the effectiveness of DeepMatching Networks and Reinforcement Learning methods.
Our main contribution holds for a broad class of problems including Max-and Min-Cut, Max-$k$-Bipartite-Bi, Maximum-Weight-Bipartite-Bi, and Traveling Salesman Problem.
As a byproduct of our analysis we introduce a novel regularization process over vanilla descent and provide theoretical and experimental evidence that it helps address vanishing-gradient issues and escape bad stationary points.
arXiv Detail & Related papers (2023-10-08T23:39:38Z) - Hierarchical Imitation Learning with Vector Quantized Models [77.67190661002691]
We propose to use reinforcement learning to identify subgoals in expert trajectories.
We build a vector-quantized generative model for the identified subgoals to perform subgoal-level planning.
In experiments, the algorithm excels at solving complex, long-horizon decision-making problems outperforming state-of-the-art.
arXiv Detail & Related papers (2023-01-30T15:04:39Z) - A Survey on Influence Maximization: From an ML-Based Combinatorial
Optimization [2.9882027965916413]
Influence Maximization (IM) is a classical optimization problem, which can be widely used in mobile networks, social computing, and recommendation systems.
Main challenge comes from the NP-hardness of the IM problem and #P-hardness of estimating the influence spread.
We focus on summarizing the relevant background knowledge, basic principles, common methods, and applied research.
arXiv Detail & Related papers (2022-11-06T10:13:42Z) - A Study of Scalarisation Techniques for Multi-Objective QUBO Solving [0.0]
Quantum and quantum-inspired optimisation algorithms have shown promising performance when applied to academic benchmarks as well as real-world problems.
However, QUBO solvers are single objective solvers. To make them more efficient at solving problems with multiple objectives, a decision on how to convert such multi-objective problems to single-objective problems need to be made.
arXiv Detail & Related papers (2022-10-20T14:54:37Z) - An Efficient Merge Search Matheuristic for Maximising the Net Present
Value of Project Schedules [5.10800491975164]
Resource constrained project scheduling is an important optimisation problem with many practical applications.
We propose a new math-heuristic algorithm based on Merge Search and parallel computing to solve the resource constrained project scheduling.
arXiv Detail & Related papers (2022-10-20T13:30:23Z) - A General Large Neighborhood Search Framework for Solving Integer Linear
Programs [46.62993477453986]
We focus on solving integer programs, and ground our approach in the large neighborhood search (SLN) paradigm.
We show that our LNS framework can significantly outperform compared to state-of-the-art commercial solvers such as Gurobi.
arXiv Detail & Related papers (2020-03-29T23:08:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.