DISC: Dynamic Decomposition Improves LLM Inference Scaling
- URL: http://arxiv.org/abs/2502.16706v1
- Date: Sun, 23 Feb 2025 20:37:32 GMT
- Title: DISC: Dynamic Decomposition Improves LLM Inference Scaling
- Authors: Jonathan Light, Wei Cheng, Wu Yue, Masafumi Oyamada, Mengdi Wang, Santiago Paternain, Haifeng Chen,
- Abstract summary: This paper introduces dynamic decomposition, a method that automatically splits solution and reasoning traces into steps during inference.<n>Experiments on coding and math benchmarks show that dynamic decomposition performs better than static methods.
- Score: 54.87338295793453
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Many inference scaling methods work by breaking a problem into smaller steps (or groups of tokens), then sampling and choosing the best next step. However, these steps and their sizes are usually predetermined based on human intuition or domain knowledge. This paper introduces dynamic decomposition, a method that automatically and adaptively splits solution and reasoning traces into steps during inference. This approach improves computational efficiency by focusing more resources on difficult steps, breaking them down further and prioritizing their sampling. Experiments on coding and math benchmarks (APPS, MATH, and LiveCodeBench) show that dynamic decomposition performs better than static methods, which rely on fixed steps like token-level, sentence-level, or single-step decompositions. These results suggest that dynamic decomposition can enhance many inference scaling techniques.
Related papers
- BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning [83.03531832811386]
BoostStep is a method that enhances reasoning accuracy through step-aligned ICL examples.<n>It integrates seamlessly with chain-of-thought (CoT) and tree search algorithms.<n>It improves DeepSeek-R1-671B's performance on AIME by 2.2%, leveraging simple examples only from the MATH dataset.
arXiv Detail & Related papers (2025-01-06T18:59:13Z) - Towards Differentiable Multilevel Optimization: A Gradient-Based Approach [1.6114012813668932]
This paper introduces a novel gradient-based approach for multilevel optimization.
Our method significantly reduces computational complexity while improving both solution accuracy and convergence speed.
To the best of our knowledge, this is one of the first algorithms to provide a general version of implicit differentiation.
arXiv Detail & Related papers (2024-10-15T06:17:59Z) - Numerical Methods for Convex Multistage Stochastic Optimization [86.45244607927732]
We focus on optimisation programming (SP), Optimal Control (SOC) and Decision Processes (MDP)
Recent progress in solving convex multistage Markov problems is based on cutting planes approximations of the cost-to-go functions of dynamic programming equations.
Cutting plane type methods can handle multistage problems with a large number of stages, but a relatively smaller number of state (decision) variables.
arXiv Detail & Related papers (2023-03-28T01:30:40Z) - Stochastic Gradient Methods with Preconditioned Updates [47.23741709751474]
There are several algorithms for such problems, but existing methods often work poorly when badly scaled and/or ill-conditioned.
Here we include preconditionimater based on Hutchinson's approach to approxing the diagonal Hessian.
We prove convergence both when smoothness and the PL condition are assumed.
arXiv Detail & Related papers (2022-06-01T07:38:08Z) - Simple Stochastic and Online Gradient DescentAlgorithms for Pairwise
Learning [65.54757265434465]
Pairwise learning refers to learning tasks where the loss function depends on a pair instances.
Online descent (OGD) is a popular approach to handle streaming data in pairwise learning.
In this paper, we propose simple and online descent to methods for pairwise learning.
arXiv Detail & Related papers (2021-11-23T18:10:48Z) - Bolstering Stochastic Gradient Descent with Model Building [0.0]
gradient descent method and its variants constitute the core optimization algorithms that achieve good convergence rates.
We propose an alternative approach to line search by using a new algorithm based on forward step model building.
We show that the proposed algorithm achieves faster convergence and better generalization in well-known test problems.
arXiv Detail & Related papers (2021-11-13T06:54:36Z) - One Step at a Time: Pros and Cons of Multi-Step Meta-Gradient
Reinforcement Learning [61.662504399411695]
We introduce a novel method mixing multiple inner steps that enjoys a more accurate and robust meta-gradient signal.
When applied to the Snake game, the mixing meta-gradient algorithm can cut the variance by a factor of 3 while achieving similar or higher performance.
arXiv Detail & Related papers (2021-10-30T08:36:52Z) - Learning to solve TV regularized problems with unrolled algorithms [18.241062505073234]
Total Variation (TV) is a popular regularization strategy that promotes piece-wise constant signals.
We develop and characterize two approaches to do so, describe their benefits and limitations, and discuss the regime where they can actually improve over iterative procedures.
arXiv Detail & Related papers (2020-10-19T14:19:02Z) - Dynamic Scale Training for Object Detection [111.33112051962514]
We propose a Dynamic Scale Training paradigm (abbreviated as DST) to mitigate scale variation challenge in object detection.
Experimental results demonstrate the efficacy of our proposed DST towards scale variation handling.
It does not introduce inference overhead and could serve as a free lunch for general detection configurations.
arXiv Detail & Related papers (2020-04-26T16:48:17Z) - Learning with Differentiable Perturbed Optimizers [54.351317101356614]
We propose a systematic method to transform operations into operations that are differentiable and never locally constant.
Our approach relies on perturbeds, and can be used readily together with existing solvers.
We show how this framework can be connected to a family of losses developed in structured prediction, and give theoretical guarantees for their use in learning tasks.
arXiv Detail & Related papers (2020-02-20T11:11:32Z) - Complexity of Stochastic Dual Dynamic Programming [7.177693955272473]
We first establish the number of iteration, i.e., complexity, required by a basic dynamic cutting plane method for solving simple multi-stage optimization problems.
We then refine these basic tools and establish the iteration complexity for both deterministic and dual dynamic programming methods.
arXiv Detail & Related papers (2019-12-16T20:56:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.