Related papers: Mitigating Tail Narrowing in LLM Self-Improvement via Socratic-Guided Sampling

Mitigating Tail Narrowing in LLM Self-Improvement via Socratic-Guided Sampling

URL: http://arxiv.org/abs/2411.00750v1
Date: Fri, 01 Nov 2024 17:18:45 GMT
Title: Mitigating Tail Narrowing in LLM Self-Improvement via Socratic-Guided Sampling
Authors: Yiwen Ding, Zhiheng Xi, Wei He, Zhuoyuan Li, Yitao Zhai, Xiaowei Shi, Xunliang Cai, Tao Gui, Qi Zhang, Xuanjing Huang,
Abstract summary: Self-improvement methods enable large language models to generate solutions themselves. We find that models tend to over-sample on easy queries and under-sample on queries they have yet to master. We introduce Guided Self-Improvement (GSI), a strategy aimed at improving the efficiency of sampling challenging heavy-tailed data.
Score: 38.7578639980701
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Self-improvement methods enable large language models (LLMs) to generate solutions themselves and iteratively train on filtered, high-quality rationales. This process proves effective and reduces the reliance on human supervision in LLMs' reasoning, but the performance soon plateaus. We delve into the process and find that models tend to over-sample on easy queries and under-sample on queries they have yet to master. As iterations proceed, this imbalance in sampling is exacerbated, leading to a long-tail distribution where solutions to difficult queries almost diminish. This phenomenon limits the performance gain of self-improving models. A straightforward solution is brute-force sampling to balance the distribution, which significantly raises computational costs. In this paper, we introduce Guided Self-Improvement (GSI), a strategy aimed at improving the efficiency of sampling challenging heavy-tailed data. It leverages Socratic-style guidance signals to help LLM reasoning with complex queries, reducing the exploration effort and minimizing computational overhead. Experiments on four models across diverse mathematical tasks show that GSI strikes a balance between performance and efficiency, while also being effective on held-out tasks.

Related papers

Reasoning on a Budget: A Survey of Adaptive and Controllable Test-Time Compute in LLMs [45.83245433138508]
Large language models (LLMs) have rapidly progressed into general-purpose agents capable of solving a broad spectrum of tasks.<n>They apply fixed inference-time compute regardless of task complexity, often overthinking simple problems while underthinking hard ones.<n>This survey presents a comprehensive review of efficient test-time compute strategies, which aim to improve the computational efficiency of LLM reasoning.
arXiv Detail & Related papers (2025-07-02T18:27:42Z)
Latent Guided Sampling for Combinatorial Optimization [3.636090511738153]
Recent Combinatorial Optimization methods leverage deep learning to learn solution strategies, trained via Supervised or Reinforcement Learning (RL)<n>While promising, these approaches often rely on task-specific augmentations, perform poorly on out-of-distribution instances, and lack robust inference mechanisms.<n>In this work, we propose LGS-Net, a novel latent space model that conditions on efficient problem instances, and introduce an efficient Neural inference method, Latent Guided Sampling (LGS)
arXiv Detail & Related papers (2025-06-04T08:02:59Z)
TACO: Think-Answer Consistency for Optimized Long-Chain Reasoning and Efficient Data Learning via Reinforcement Learning in LVLMs [50.820065021136024]
DeepSeek R1 has significantly advanced complex reasoning for large language models (LLMs)<n>Recent methods have attempted to replicate R1's reasoning capabilities in multimodal settings.<n>We propose TACO, a novel reinforcement learning algorithm for visual reasoning.
arXiv Detail & Related papers (2025-05-27T06:30:48Z)
Do We Truly Need So Many Samples? Multi-LLM Repeated Sampling Efficiently Scales Test-Time Compute [55.330813919992465]
This paper presents a simple, effective, and cost-efficient strategy to improve LLM performance by scaling test-time compute. Our strategy builds upon the repeated-sampling-then-voting framework, with a novel twist: incorporating multiple models, even weaker ones, to leverage their complementary strengths.
arXiv Detail & Related papers (2025-04-01T13:13:43Z)
ConSol: Sequential Probability Ratio Testing to Find Consistent LLM Reasoning Paths Efficiently [3.6393221632527686]
Small language models (LLMs) solve complex tasks by generating intermediate reasoning steps prior to providing answers. The widely-used self-consistency method further exacerbates these costs by aggregating multiple reasoning paths to improve accuracy. We propose leveraging Sequential Probability Ratio Testing (SPRT) to dynamically terminate sampling once sufficient consistency is achieved.
arXiv Detail & Related papers (2025-03-22T00:07:28Z)
Diversified Sampling Improves Scaling LLM inference [31.18762591875725]
DivSampling is a novel and versatile sampling technique designed to enhance the diversity of candidate solutions. Our theoretical analysis demonstrates that, under mild assumptions, the error rates of responses generated from diverse prompts are significantly lower compared to those produced by stationary prompts.
arXiv Detail & Related papers (2025-02-16T07:37:58Z)
In-context Demonstration Matters: On Prompt Optimization for Pseudo-Supervision Refinement [71.60563181678323]
Large language models (LLMs) have achieved great success across diverse tasks, and fine-tuning is sometimes needed to further enhance generation quality.<n>To handle these challenges, a direct solution is to generate high-confidence'' data from unsupervised downstream tasks.<n>We propose a novel approach, pseudo-supervised demonstrations aligned prompt optimization (PAPO) algorithm, which jointly refines both the prompt and the overall pseudo-supervision.
arXiv Detail & Related papers (2024-10-04T03:39:28Z)
Step-by-Step Reasoning for Math Problems via Twisted Sequential Monte Carlo [55.452453947359736]
We introduce a novel verification method based on Twisted Sequential Monte Carlo (TSMC) We apply TSMC to Large Language Models by estimating the expected future rewards at partial solutions. This approach results in a more straightforward training target that eliminates the need for step-wise human annotations.
arXiv Detail & Related papers (2024-10-02T18:17:54Z)
Improve Mathematical Reasoning in Language Models by Automated Process Supervision [22.72856086318912]
We propose a novel Monte Carlo Tree Search (MCTS) algorithm named textitOmegaPRM for the efficient collection of high-quality process supervision data. We are able to collect over 1.5 million process supervision annotations to train a Process Reward Model (PRM) We have enhanced the instruction tuned Gemini Pro model's math reasoning performance, achieving a 69.4% success rate on the MATH benchmark.
arXiv Detail & Related papers (2024-06-05T19:25:40Z)
MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time [51.5039731721706]
MindStar is a purely inference-based searching method for large language models. It formulates reasoning tasks as searching problems and proposes two search ideas to identify the optimal reasoning paths. It significantly enhances the reasoning abilities of open-source models, such as Llama-2-13B and Mistral-7B, and achieves comparable performance to GPT-3.5 and Grok-1.
arXiv Detail & Related papers (2024-05-25T15:07:33Z)
Take the Bull by the Horns: Hard Sample-Reweighted Continual Training Improves LLM Generalization [165.98557106089777]
A key challenge is to enhance the capabilities of large language models (LLMs) amid a looming shortage of high-quality training data. Our study starts from an empirical strategy for the light continual training of LLMs using their original pre-training data sets. We then formalize this strategy into a principled framework of Instance-Reweighted Distributionally Robust Optimization.
arXiv Detail & Related papers (2024-02-22T04:10:57Z)
Online Cascade Learning for Efficient Inference over Streams [9.516197133796437]
Large Language Models (LLMs) have a natural role in answering complex queries about data streams. We propose online cascade learning, the first approach to address this challenge. We formulate the task of learning cascades online as an imitation-learning problem.
arXiv Detail & Related papers (2024-02-07T01:46:50Z)
Amortizing intractable inference in large language models [56.92471123778389]
We use amortized Bayesian inference to sample from intractable posterior distributions. We empirically demonstrate that this distribution-matching paradigm of LLM fine-tuning can serve as an effective alternative to maximum-likelihood training. As an important application, we interpret chain-of-thought reasoning as a latent variable modeling problem.
arXiv Detail & Related papers (2023-10-06T16:36:08Z)
AdaMerging: Adaptive Model Merging for Multi-Task Learning [68.75885518081357]
This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging) It aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data. Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11% improvement in performance.
arXiv Detail & Related papers (2023-10-04T04:26:33Z)
Learning Sampling Distributions for Model Predictive Control [36.82905770866734]
Sampling-based approaches to Model Predictive Control (MPC) have become a cornerstone of contemporary approaches to MPC. We propose to carry out all operations in the latent space, allowing us to take full advantage of the learned distribution. Specifically, we frame the learning problem as bi-level optimization and show how to train the controller with backpropagation-through-time.
arXiv Detail & Related papers (2022-12-05T20:35:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.