Related papers: Divide-or-Conquer? Which Part Should You Distill Your LLM?

Divide-or-Conquer? Which Part Should You Distill Your LLM?

URL: http://arxiv.org/abs/2402.15000v1
Date: Thu, 22 Feb 2024 22:28:46 GMT
Title: Divide-or-Conquer? Which Part Should You Distill Your LLM?
Authors: Zhuofeng Wu, He Bai, Aonan Zhang, Jiatao Gu, VG Vinod Vydiswaran, Navdeep Jaitly, Yizhe Zhang
Abstract summary: We devise a similar strategy that breaks down reasoning tasks into a problem decomposition phase and a problem solving phase. We show that the strategy is able to outperform a single stage solution.
Score: 40.563633582127316
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent methods have demonstrated that Large Language Models (LLMs) can solve reasoning tasks better when they are encouraged to solve subtasks of the main task first. In this paper we devise a similar strategy that breaks down reasoning tasks into a problem decomposition phase and a problem solving phase and show that the strategy is able to outperform a single stage solution. Further, we hypothesize that the decomposition should be easier to distill into a smaller model compared to the problem solving because the latter requires large amounts of domain knowledge while the former only requires learning general problem solving strategies. We propose methods to distill these two capabilities and evaluate their impact on reasoning outcomes and inference cost. We find that we can distill the problem decomposition phase and at the same time achieve good generalization across tasks, datasets, and models. However, it is harder to distill the problem solving capability without losing performance and the resulting distilled model struggles with generalization. These results indicate that by using smaller, distilled problem decomposition models in combination with problem solving LLMs we can achieve reasoning with cost-efficient inference and local adaptation.

Related papers

MetaLadder: Ascending Mathematical Solution Quality via Analogical-Problem Reasoning Transfer [37.81465564673498]
Large Language Models (LLMs) have demonstrated promising capabilities in solving mathematical reasoning tasks. We propose textbfMetaLadder, a framework that explicitly prompts LLMs to recall and reflect on meta-problems. Our experiments on mathematical benchmarks demonstrate that our MetaLadder significantly boosts LLMs' problem-solving accuracy.
arXiv Detail & Related papers (2025-03-19T04:36:35Z)
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs [76.43407125275202]
o1-like models can emulate human-like long-time thinking during inference. This paper presents the first comprehensive study on the prevalent issue of overthinking in these models. We propose strategies to mitigate overthinking, streamlining reasoning processes without compromising accuracy.
arXiv Detail & Related papers (2024-12-30T18:55:12Z)
SIKeD: Self-guided Iterative Knowledge Distillation for mathematical reasoning [49.29200323760457]
Large Language Models (LLMs) can transfer their reasoning skills to smaller models. Smaller models are not expressive enough to fit the LLMs distribution on all strategies when distilled. This reliance on one strategy poses a challenge for smaller models when attempting to solve reasoning tasks that may be difficult with their preferred strategy.
arXiv Detail & Related papers (2024-10-24T09:29:18Z)
Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Better Solvers for Math Word Problems [50.76385564061713]
Chain-of-Thought (CoT) prompting has enhanced the performance of Large Language Models (LLMs) across various reasoning tasks. CoT usually suffers from three pitfalls: semantic misunderstanding errors, calculation errors, and step-missing errors. We propose Deeply Understanding the Problems (DUP) to improve the LLMs' math problem-solving ability by addressing semantic misunderstanding errors.
arXiv Detail & Related papers (2024-04-23T12:16:05Z)
Distilling Algorithmic Reasoning from LLMs via Explaining Solution Programs [2.3020018305241337]
Distilling explicit chain-of-thought reasoning paths has emerged as an effective method for improving the reasoning abilities of large language models. We propose a novel approach to distill reasoning abilities from LLMs by leveraging their capacity to explain solutions. Our experiments demonstrate that learning from explanations enables the Reasoner to more effectively guide program implementation by a Coder.
arXiv Detail & Related papers (2024-04-11T22:19:50Z)
PuzzleBench: Can LLMs Solve Challenging First-Order Combinatorial Reasoning Problems? [27.696027301600793]
We present PuzzleBench, a dataset of 31 such challenging problems along with a few solved instances for each problem. These problems are all first order, i.e., they can be instantiated with problem instances of varying sizes, and most of them are NP-hard. We first observe that LLMs, even when aided by symbolic solvers, perform rather poorly on our dataset. In response, we propose a new approach, Puzzle-LM, which combines LLMs with both symbolic solvers and interpreter.
arXiv Detail & Related papers (2024-02-04T20:56:09Z)
Small Language Models Fine-tuned to Coordinate Larger Language Models improve Complex Reasoning [41.03267013352519]
Large Language Models (LLMs) prompted to generate chain-of-thought exhibit impressive reasoning capabilities. We introduce DaSLaM, which uses a decomposition generator to decompose complex problems into subproblems that require fewer reasoning steps. We show that DaSLaM is not limited by the solver's capabilities as a function of scale.
arXiv Detail & Related papers (2023-10-21T15:23:20Z)
Adaptive-Solver Framework for Dynamic Strategy Selection in Large Language Model Reasoning [34.568072559937455]
Large Language Models (LLMs) are showcasing impressive ability in handling complex reasoning tasks. Most methodologies that leverage LLMs tend to adopt a uniform approach. Inflexibility of them can bring unnecessary computational overhead or sub-optimal performance. We introduce an Adaptive-r framework that strategically modulates solving strategies based on the difficulties of the problems.
arXiv Detail & Related papers (2023-10-01T12:28:36Z)
Faith and Fate: Limits of Transformers on Compositionality [109.79516190693415]
We investigate the limits of transformer large language models across three representative compositional tasks. These tasks require breaking problems down into sub-steps and synthesizing these steps into a precise answer. Our empirical findings suggest that transformer LLMs solve compositional tasks by reducing multi-step compositional reasoning into linearized subgraph matching.
arXiv Detail & Related papers (2023-05-29T23:24:14Z)
Distilling Reasoning Capabilities into Smaller Language Models [83.66051257039763]
Step-by-step reasoning approaches like chain of thought (CoT) have proved to be very effective in inducing reasoning capabilities in large language models. However, the success of the CoT approach is fundamentally tied to the model size, and billion parameter-scale models are often needed to get CoT to work. We propose a knowledge distillation approach that leverages the step-by-step CoT reasoning capabilities of larger models and distills these abilities into smaller models.
arXiv Detail & Related papers (2022-12-01T00:39:56Z)
Direct Heterogeneous Causal Learning for Resource Allocation Problems in Marketing [20.9377115817821]
Marketing is an important mechanism to increase user engagement and improve platform revenue. Most decision-making problems in marketing can be formulated as resource allocation problems and have been studied for decades. Existing works usually divide the solution procedure into two fully decoupled stages, i.e., machine learning (ML) and operation research (OR)
arXiv Detail & Related papers (2022-11-28T19:27:34Z)
A Mutual Information Maximization Approach for the Spurious Solution Problem in Weakly Supervised Question Answering [60.768146126094955]
Weakly supervised question answering usually has only the final answers as supervision signals. There may exist many spurious solutions that coincidentally derive the correct answer, but training on such solutions can hurt model performance. We propose to explicitly exploit such semantic correlations by maximizing the mutual information between question-answer pairs and predicted solutions.
arXiv Detail & Related papers (2021-06-14T05:47:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.