Related papers: Progress or Regress? Self-Improvement Reversal in Post-training

Progress or Regress? Self-Improvement Reversal in Post-training

URL: http://arxiv.org/abs/2407.05013v1
Date: Sat, 6 Jul 2024 09:07:11 GMT
Title: Progress or Regress? Self-Improvement Reversal in Post-training
Authors: Ting Wu, Xuefeng Li, Pengfei Liu,
Abstract summary: We propose a comprehensive evaluative framework to scrutinize the underlying enhancements of post-training paradigms for self-improvement. We show that models showing improved performance across benchmarks will paradoxically exhibit declines in broader, essential capabilities. These findings indicate that current self-improvement practices through post-training are inadequate for equipping models to tackle more complex problems.
Score: 26.051637877066327
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Self-improvement through post-training methods such as iterative preference learning has been acclaimed for enhancing the problem-solving capabilities (e.g., mathematical reasoning) of Large Language Models (LLMs) without human intervention. However, as exploration deepens, it becomes crucial to assess whether these improvements genuinely signify progress in solving more challenging problems or if they could lead to unintended regressions. To address this, we propose a comprehensive evaluative framework that goes beyond the superficial pass@1 metric to scrutinize the underlying enhancements of post-training paradigms for self-improvement. Through rigorous experimentation and analysis across diverse problem-solving tasks, the empirical results point out the phenomenon of \emph{self-improvement reversal}, where models showing improved performance across benchmarks will paradoxically exhibit declines in broader, essential capabilities, like output diversity and out-of-distribution (OOD) generalization. These findings indicate that current self-improvement practices through post-training are inadequate for equipping models to tackle more complex problems. Furthermore, they underscore the necessity of our critical evaluation metrics in discerning the \emph{progress or regress} dichotomy for self-improving LLMs.

Related papers

Superficial Self-Improved Reasoners Benefit from Model Merging [38.72827436256771]
Self-improvement as a solution to synthesizing high-quality data corpus. In particular, our analysis reveals that even when LMs show improved in-domain (ID) reasoning accuracy, they actually compromise their generalized reasoning capabilities. We propose Iterative Model Merging (IMM), a method that strategically combines weights from original and self-improved models to preserve generalization.
arXiv Detail & Related papers (2025-03-03T22:41:25Z)
Adversarial Alignment for LLMs Requires Simpler, Reproducible, and More Measurable Objectives [52.863024096759816]
Misaligned research objectives have hindered progress in adversarial robustness research over the past decade. We argue that realigned objectives are necessary for meaningful progress in adversarial alignment.
arXiv Detail & Related papers (2025-02-17T15:28:40Z)
Iterative Deepening Sampling for Large Language Models [27.807695570974644]
Training models to achieve effective self-correction and self-correction remains a significant challenge. We propose a novel iterative sampling algorithm framework designed to enhance self-correction and generate higher-quality samples.
arXiv Detail & Related papers (2025-02-08T04:39:51Z)
Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models [10.449015816015566]
Self-improvement is a mechanism in Large Language Model (LLM) pre-training, post-training and test-time inference. We provide a mathematical formulation for self-improvement, which is largely governed by a quantity which we formalize as the generation-verification gap. We also examine when self-improvement is possible, an iterative self-improvement procedure, and ways to improve its performance.
arXiv Detail & Related papers (2024-12-03T18:47:26Z)
Self-Improvement in Language Models: The Sharpening Mechanism [70.9248553790022]
We offer a new perspective on the capabilities of self-improvement through a lens we refer to as sharpening. Motivated by the observation that language models are often better at verifying response quality than they are at generating correct responses, we formalize self-improvement as using the model itself as a verifier during post-training. We analyze two natural families of self-improvement algorithms based on SFT and RLHF.
arXiv Detail & Related papers (2024-12-02T20:24:17Z)
Contextual Bandits in Payment Processing: Non-uniform Exploration and Supervised Learning at Adyen [0.0]
Regression oracles combine non-uniform exploration with supervised learning. We analyze these approaches within a real-world industrial context at Adyen, a large global payments processor. While regression oracles significantly improve performance, they introduce challenges due to rigid algorithmic assumptions.
arXiv Detail & Related papers (2024-11-30T19:45:23Z)
Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision [120.40788744292739]
We propose a two-player paradigm that separates the roles of reasoning and critique models. We first propose AutoMathCritique, an automated and scalable framework for collecting critique data. We demonstrate that the critique models consistently improve the actor's performance on difficult queries at test-time.
arXiv Detail & Related papers (2024-11-25T17:11:54Z)
Recursive Introspection: Teaching Language Model Agents How to Self-Improve [30.086494067593268]
We develop RISE: Recursive IntroSpEction, an approach for fine-tuning large language models. Our experiments show that RISE enables Llama2, Llama3, and Mistral models to improve themselves with more turns on math reasoning tasks.
arXiv Detail & Related papers (2024-07-25T17:35:59Z)
MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs [55.20845457594977]
Large language models (LLMs) have shown increasing capability in problem-solving and decision-making. We present a process-based benchmark MR-Ben that demands a meta-reasoning skill. Our meta-reasoning paradigm is especially suited for system-2 slow thinking.
arXiv Detail & Related papers (2024-06-20T03:50:23Z)
Unlearning with Control: Assessing Real-world Utility for Large Language Model Unlearning [97.2995389188179]
Recent research has begun to approach large language models (LLMs) unlearning via gradient ascent (GA) Despite their simplicity and efficiency, we suggest that GA-based methods face the propensity towards excessive unlearning. We propose several controlling methods that can regulate the extent of excessive unlearning.
arXiv Detail & Related papers (2024-06-13T14:41:00Z)
Enhancing Q-Learning with Large Language Model Heuristics [0.0]
Large language models (LLMs) can achieve zero-shot learning for simpler tasks, but they suffer from low inference speeds and occasional hallucinations. We propose textbfLLM-guided Q-learning, a framework that leverages LLMs as hallucinations to aid in learning the Q-function for reinforcement learning.
arXiv Detail & Related papers (2024-05-06T10:42:28Z)
Improving the Robustness of Large Language Models via Consistency Alignment [36.24876571343749]
Large language models (LLMs) have shown tremendous success in following user instructions and generating helpful responses. LLMs may generate significantly inconsistent responses due to minor changes in the verbalized instructions. We propose a two-stage training framework consisting of instruction-augmented supervised fine-tuning and consistency alignment training.
arXiv Detail & Related papers (2024-03-21T08:21:12Z)
Augmenting Unsupervised Reinforcement Learning with Self-Reference [63.68018737038331]
Humans possess the ability to draw on past experiences explicitly when learning new tasks. We propose the Self-Reference (SR) approach, an add-on module explicitly designed to leverage historical information. Our approach achieves state-of-the-art results in terms of Interquartile Mean (IQM) performance and Optimality Gap reduction on the Unsupervised Reinforcement Learning Benchmark.
arXiv Detail & Related papers (2023-11-16T09:07:34Z)
QualEval: Qualitative Evaluation for Model Improvement [82.73561470966658]
We propose QualEval, which augments quantitative scalar metrics with automated qualitative evaluation as a vehicle for model improvement. QualEval uses a powerful LLM reasoner and our novel flexible linear programming solver to generate human-readable insights. We demonstrate that leveraging its insights, for example, improves the absolute performance of the Llama 2 model by up to 15% points relative.
arXiv Detail & Related papers (2023-11-06T00:21:44Z)
N-Critics: Self-Refinement of Large Language Models with Ensemble of Critics [5.516095889257118]
We propose a self-correction mechanism for Large Language Models (LLMs) to mitigate issues such as toxicity and fact hallucination. This method involves refining model outputs through an ensemble of critics and the model's own feedback.
arXiv Detail & Related papers (2023-10-28T11:22:22Z)
Towards Reliable and Fluent Large Language Models: Incorporating Feedback Learning Loops in QA Systems [10.58737969057445]
We build a dataset to train a critic model capable of evaluating the citation, correctness, and fluency of responses generated by large language models. We propose an automated feedback mechanism that leverages the critic model to offer real-time feedback on heterogeneous aspects of generated text. Experimental results demonstrate the efficacy of our approach, including a 4% precision increase in citation and an approximately 8% enhancement in the MAUVE metric for fluency.
arXiv Detail & Related papers (2023-09-08T09:39:53Z)
Accurate and Robust Feature Importance Estimation under Distribution Shifts [49.58991359544005]
PRoFILE is a novel feature importance estimation method. We show significant improvements over state-of-the-art approaches, both in terms of fidelity and robustness.
arXiv Detail & Related papers (2020-09-30T05:29:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.