Teaching Large Language Models to Reason through Learning and Forgetting
- URL: http://arxiv.org/abs/2504.11364v2
- Date: Wed, 23 Apr 2025 21:27:59 GMT
- Title: Teaching Large Language Models to Reason through Learning and Forgetting
- Authors: Tianwei Ni, Allen Nie, Sapana Chaudhary, Yao Liu, Huzefa Rangwala, Rasool Fakoor,
- Abstract summary: Leveraging inference-time search in large language models has proven effective in further enhancing a trained model's capability to solve complex mathematical and reasoning problems.<n>This approach significantly increases computational costs and inference time.<n>We propose an effective approach that integrates search capabilities directly into the model by fine-tuning it using both successful (learning) and failed reasoning paths.
- Score: 23.384882158333156
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Leveraging inference-time search in large language models has proven effective in further enhancing a trained model's capability to solve complex mathematical and reasoning problems. However, this approach significantly increases computational costs and inference time, as the model must generate and evaluate multiple candidate solutions to identify a viable reasoning path. To address this, we propose an effective approach that integrates search capabilities directly into the model by fine-tuning it using both successful (learning) and failed reasoning paths (forgetting) derived from diverse search methods. While fine-tuning the model with these data might seem straightforward, we identify a critical issue: the model's search capability tends to degrade rapidly if fine-tuning is performed naively. We show that this degradation can be substantially mitigated by employing a smaller learning rate. Extensive experiments on the challenging Game-of-24 and Countdown mathematical reasoning benchmarks show that our approach not only outperforms both standard fine-tuning and inference-time search baselines but also significantly reduces inference time by 180$\times$.
Related papers
- Think Deep, Think Fast: Investigating Efficiency of Verifier-free Inference-time-scaling Methods [39.89239733570008]
This work conducts a comprehensive analysis of inference-time scaling methods for both reasoning and non-reasoning models.
We find that non-reasoning models, even with an extremely high inference budget, still fall substantially behind reasoning models.
For reasoning models, majority voting proves to be a robust inference strategy, generally competitive or outperforming other more sophisticated ITC methods.
arXiv Detail & Related papers (2025-04-18T19:32:55Z) - Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead [33.011660907969706]
Inference-time scaling can enhance the reasoning capabilities of large language models.<n>We investigate the benefits and limitations of scaling methods across nine state-of-the-art models and eight challenging tasks.
arXiv Detail & Related papers (2025-03-31T23:40:28Z) - Causality can systematically address the monsters under the bench(marks) [64.36592889550431]
Benchmarks are plagued by various biases, artifacts, or leakage.<n>Models may behave unreliably due to poorly explored failure modes.<n> causality offers an ideal framework to systematically address these challenges.
arXiv Detail & Related papers (2025-02-07T17:01:37Z) - Self-supervised Analogical Learning using Language Models [59.64260218737556]
We propose SAL, a self-supervised analogical learning framework.
SAL mimics the human analogy process and trains models to explicitly transfer high-quality symbolic solutions.
We show that the resulting models outperform base language models on a wide range of reasoning benchmarks.
arXiv Detail & Related papers (2025-02-03T02:31:26Z) - Patience Is The Key to Large Language Model Reasoning [0.0]
We propose a simple method by encouraging models to adopt a more patient reasoning style.<n>We generate detailed reasoning processes as positive examples and simple answers as negative examples, thereby training the model to favor thoroughness in its responses.<n>Our results demonstrate a performance increase of up to 2.1% on GSM8k with training just on a lightweight dataset.
arXiv Detail & Related papers (2024-11-20T07:20:48Z) - Provable unlearning in topic modeling and downstream tasks [36.571324268874264]
Provable guarantees for unlearning are often limited to supervised learning settings.
We provide the first theoretical guarantees for unlearning in the pre-training and fine-tuning paradigm.
We show that it is easier to unlearn pre-training data from models that have been fine-tuned to a particular task, and one can unlearn this data without modifying the base model.
arXiv Detail & Related papers (2024-11-19T16:04:31Z) - Token-Supervised Value Models for Enhancing Mathematical Problem-Solving Capabilities of Large Language Models [56.32800938317095]
Existing verifiers are sub-optimal for tree search techniques at test time.
We propose token-supervised value models (TVMs)
TVMs assign each token a probability that reflects the likelihood of reaching the correct final answer.
arXiv Detail & Related papers (2024-07-12T13:16:50Z) - Towards Learning Stochastic Population Models by Gradient Descent [0.0]
We show that simultaneous estimation of parameters and structure poses major challenges for optimization procedures.
We demonstrate accurate estimation of models but find that enforcing the inference of parsimonious, interpretable models drastically increases the difficulty.
arXiv Detail & Related papers (2024-04-10T14:38:58Z) - Zero-shot Retrieval: Augmenting Pre-trained Models with Search Engines [83.65380507372483]
Large pre-trained models can dramatically reduce the amount of task-specific data required to solve a problem, but they often fail to capture domain-specific nuances out of the box.
This paper shows how to leverage recent advances in NLP and multi-modal learning to augment a pre-trained model with search engine retrieval.
arXiv Detail & Related papers (2023-11-29T05:33:28Z) - Fairness-guided Few-shot Prompting for Large Language Models [93.05624064699965]
In-context learning can suffer from high instability due to variations in training examples, example order, and prompt formats.
We introduce a metric to evaluate the predictive bias of a fixed prompt against labels or a given attributes.
We propose a novel search strategy based on the greedy search to identify the near-optimal prompt for improving the performance of in-context learning.
arXiv Detail & Related papers (2023-03-23T12:28:25Z) - Efficient Model-Based Reinforcement Learning through Optimistic Policy
Search and Planning [93.1435980666675]
We show how optimistic exploration can be easily combined with state-of-the-art reinforcement learning algorithms.
Our experiments demonstrate that optimistic exploration significantly speeds-up learning when there are penalties on actions.
arXiv Detail & Related papers (2020-06-15T18:37:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.