Related papers: Offline Learning and Forgetting for Reasoning with Large Language Models

Offline Learning and Forgetting for Reasoning with Large Language Models

URL: http://arxiv.org/abs/2504.11364v3
Date: Tue, 08 Jul 2025 01:26:04 GMT
Title: Offline Learning and Forgetting for Reasoning with Large Language Models
Authors: Tianwei Ni, Allen Nie, Sapana Chaudhary, Yao Liu, Huzefa Rangwala, Rasool Fakoor,
Abstract summary: We propose an effective approach that integrates search capabilities directly into the model by fine-tuning it on unpaired successful and failed reasoning paths.<n>Experiments on the challenging Game-of-24 and Countdown reasoning benchmarks show that, replacing CoT-generated data with search-generated data for offline fine-tuning improves success rates by around 23% over inference-time search baselines.<n>Our learning and forgetting objective consistently outperforms both supervised fine-tuning and preference-based methods.
Score: 23.384882158333156
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Leveraging inference-time search in large language models has proven effective in further enhancing a trained model's capability to solve complex mathematical and reasoning problems. However, this approach significantly increases computational costs and inference time, as the model must generate and evaluate multiple candidate solutions to identify a viable reasoning path. To address this, we propose an effective approach that integrates search capabilities directly into the model by fine-tuning it on unpaired successful (learning) and failed reasoning paths (forgetting) derived from diverse search methods. A key challenge we identify is that naive fine-tuning can degrade the model's search capability; we show this can be mitigated with a smaller learning rate. Extensive experiments on the challenging Game-of-24 and Countdown reasoning benchmarks show that, replacing CoT-generated data with search-generated data for offline fine-tuning improves success rates by around 23% over inference-time search baselines, while reducing inference time by 180$\times$. On top of this, our learning and forgetting objective consistently outperforms both supervised fine-tuning and preference-based methods.

Related papers

Unlocking the Potential of Difficulty Prior in RL-based Multimodal Reasoning [69.64809103333839]
We investigate how explicitly modeling problem's difficulty prior information shapes the effectiveness of reinforcement learning based fine-tuning for multimodal reasoning.<n>Our approach demonstrates significant performances across various multi-modal mathematical reasoning benchmarks with only 2K+0.6K two-stage training data.
arXiv Detail & Related papers (2025-05-19T15:43:10Z)
Think Deep, Think Fast: Investigating Efficiency of Verifier-free Inference-time-scaling Methods [39.89239733570008]
This work conducts a comprehensive analysis of inference-time scaling methods for both reasoning and non-reasoning models. We find that non-reasoning models, even with an extremely high inference budget, still fall substantially behind reasoning models. For reasoning models, majority voting proves to be a robust inference strategy, generally competitive or outperforming other more sophisticated ITC methods.
arXiv Detail & Related papers (2025-04-18T19:32:55Z)
Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead [33.011660907969706]
Inference-time scaling can enhance the reasoning capabilities of large language models.<n>We investigate the benefits and limitations of scaling methods across nine state-of-the-art models and eight challenging tasks.
arXiv Detail & Related papers (2025-03-31T23:40:28Z)
Exploring Training and Inference Scaling Laws in Generative Retrieval [50.82554729023865]
Generative retrieval reformulates retrieval as an autoregressive generation task, where large language models generate target documents directly from a query.<n>We systematically investigate training and inference scaling laws in generative retrieval, exploring how model size, training data scale, and inference-time compute jointly influence performance.
arXiv Detail & Related papers (2025-03-24T17:59:03Z)
Causality can systematically address the monsters under the bench(marks) [64.36592889550431]
Benchmarks are plagued by various biases, artifacts, or leakage.<n>Models may behave unreliably due to poorly explored failure modes.<n> causality offers an ideal framework to systematically address these challenges.
arXiv Detail & Related papers (2025-02-07T17:01:37Z)
Self-supervised Analogical Learning using Language Models [59.64260218737556]
We propose SAL, a self-supervised analogical learning framework. SAL mimics the human analogy process and trains models to explicitly transfer high-quality symbolic solutions. We show that the resulting models outperform base language models on a wide range of reasoning benchmarks.
arXiv Detail & Related papers (2025-02-03T02:31:26Z)
Patience Is The Key to Large Language Model Reasoning [0.0]
We propose a simple method by encouraging models to adopt a more patient reasoning style.<n>We generate detailed reasoning processes as positive examples and simple answers as negative examples, thereby training the model to favor thoroughness in its responses.<n>Our results demonstrate a performance increase of up to 2.1% on GSM8k with training just on a lightweight dataset.
arXiv Detail & Related papers (2024-11-20T07:20:48Z)
Provable unlearning in topic modeling and downstream tasks [36.571324268874264]
Provable guarantees for unlearning are often limited to supervised learning settings. We provide the first theoretical guarantees for unlearning in the pre-training and fine-tuning paradigm. We show that it is easier to unlearn pre-training data from models that have been fine-tuned to a particular task, and one can unlearn this data without modifying the base model.
arXiv Detail & Related papers (2024-11-19T16:04:31Z)
Guided Stream of Search: Learning to Better Search with Language Models via Optimal Path Guidance [17.28280896937486]
We show how to leverage optimal solutions to enhance the search and planning abilities of language models. Our approach significantly enhances the search and planning abilities of language models on Countdown, a simple yet challenging mathematical reasoning task.
arXiv Detail & Related papers (2024-10-03T21:07:59Z)
Beyond Accuracy Optimization: Computer Vision Losses for Large Language Model Fine-Tuning [9.507070656654632]
Large Language Models (LLMs) have demonstrated impressive performance across various tasks. Current training approaches combine standard cross-entropy loss with extensive data, human feedback, or ad hoc methods to enhance performance. This study investigates the use of established semantic segmentation loss functions in natural language generation to create a versatile, practical, and scalable solution.
arXiv Detail & Related papers (2024-09-20T16:46:17Z)
Token-Supervised Value Models for Enhancing Mathematical Problem-Solving Capabilities of Large Language Models [56.32800938317095]
Existing verifiers are sub-optimal for tree search techniques at test time. We propose token-supervised value models (TVMs) TVMs assign each token a probability that reflects the likelihood of reaching the correct final answer.
arXiv Detail & Related papers (2024-07-12T13:16:50Z)
Towards Learning Stochastic Population Models by Gradient Descent [0.0]
We show that simultaneous estimation of parameters and structure poses major challenges for optimization procedures. We demonstrate accurate estimation of models but find that enforcing the inference of parsimonious, interpretable models drastically increases the difficulty.
arXiv Detail & Related papers (2024-04-10T14:38:58Z)
Zero-shot Retrieval: Augmenting Pre-trained Models with Search Engines [83.65380507372483]
Large pre-trained models can dramatically reduce the amount of task-specific data required to solve a problem, but they often fail to capture domain-specific nuances out of the box. This paper shows how to leverage recent advances in NLP and multi-modal learning to augment a pre-trained model with search engine retrieval.
arXiv Detail & Related papers (2023-11-29T05:33:28Z)
Query-Dependent Prompt Evaluation and Optimization with Offline Inverse RL [62.824464372594576]
We aim to enhance arithmetic reasoning ability of Large Language Models (LLMs) through zero-shot prompt optimization. We identify a previously overlooked objective of query dependency in such optimization. We introduce Prompt-OIRL, which harnesses offline inverse reinforcement learning to draw insights from offline prompting demonstration data.
arXiv Detail & Related papers (2023-09-13T01:12:52Z)
Fairness-guided Few-shot Prompting for Large Language Models [93.05624064699965]
In-context learning can suffer from high instability due to variations in training examples, example order, and prompt formats. We introduce a metric to evaluate the predictive bias of a fixed prompt against labels or a given attributes. We propose a novel search strategy based on the greedy search to identify the near-optimal prompt for improving the performance of in-context learning.
arXiv Detail & Related papers (2023-03-23T12:28:25Z)
On Modality Bias Recognition and Reduction [70.69194431713825]
We study the modality bias problem in the context of multi-modal classification. We propose a plug-and-play loss function method, whereby the feature space for each label is adaptively learned. Our method yields remarkable performance improvements compared with the baselines.
arXiv Detail & Related papers (2022-02-25T13:47:09Z)
Efficient Active Search for Combinatorial Optimization Problems [1.6543719822033436]
We show that (efficient) active search enables learned models to effectively solve instances that are much larger than those seen during training. The proposed methods offer a simple way to significantly improve the search performance of a given model and outperform state-of-the-art machine learning based methods on routing problems.
arXiv Detail & Related papers (2021-06-09T15:08:03Z)
Efficient Model-Based Reinforcement Learning through Optimistic Policy Search and Planning [93.1435980666675]
We show how optimistic exploration can be easily combined with state-of-the-art reinforcement learning algorithms. Our experiments demonstrate that optimistic exploration significantly speeds-up learning when there are penalties on actions.
arXiv Detail & Related papers (2020-06-15T18:37:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.