V-STaR: Training Verifiers for Self-Taught Reasoners
- URL: http://arxiv.org/abs/2402.06457v1
- Date: Fri, 9 Feb 2024 15:02:56 GMT
- Title: V-STaR: Training Verifiers for Self-Taught Reasoners
- Authors: Arian Hosseini, Xingdi Yuan, Nikolay Malkin, Aaron Courville,
Alessandro Sordoni and Rishabh Agarwal
- Abstract summary: We propose V-STaR that utilizes both the correct and incorrect solutions generated during the self-improvement process to train a verifier.
V-STaR delivers a 4% to 17% test accuracy improvement over existing self-improvement and verification approaches.
- Score: 75.11811592995176
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Common self-improvement approaches for large language models (LLMs), such as
STaR (Zelikman et al., 2022), iteratively fine-tune LLMs on self-generated
solutions to improve their problem-solving ability. However, these approaches
discard the large amounts of incorrect solutions generated during this process,
potentially neglecting valuable information in such solutions. To address this
shortcoming, we propose V-STaR that utilizes both the correct and incorrect
solutions generated during the self-improvement process to train a verifier
using DPO that judges correctness of model-generated solutions. This verifier
is used at inference time to select one solution among many candidate
solutions. Running V-STaR for multiple iterations results in progressively
better reasoners and verifiers, delivering a 4% to 17% test accuracy
improvement over existing self-improvement and verification approaches on
common code generation and math reasoning benchmarks with LLaMA2 models.
Related papers
- Small Language Models Need Strong Verifiers to Self-Correct Reasoning [69.94251699982388]
Self-correction has emerged as a promising solution to boost the reasoning performance of large language models (LLMs)
This work explores whether small (= 13B) language models (LMs) have the ability of self-correction on reasoning tasks with minimal inputs from stronger LMs.
arXiv Detail & Related papers (2024-04-26T03:41:28Z) - Enhancing Large Language Model Performance To Answer Questions and
Extract Information More Accurately [2.1715455600756646]
Large Language Models (LLMs) generate responses to questions.
Their effectiveness is often hindered by sub-optimal quality of answers and occasional failures to provide accurate responses to questions.
To address these challenges, a fine-tuning process is employed, involving feedback and examples to refine models.
arXiv Detail & Related papers (2024-01-27T00:18:07Z) - Improving Large Language Model Fine-tuning for Solving Math Problems [20.417053742869403]
A large gap exists between large language models' pass-at-one and pass-at-N performance in solving math problems.
Using the challenging MATH dataset, we investigate three fine-tuning strategies.
We design a fine-tuning recipe that yields approximately 58.8% accuracy on the MATH dataset with fine-tuned PaLM 2-L models.
arXiv Detail & Related papers (2023-10-16T04:11:19Z) - Enhancing SAEAs with Unevaluated Solutions: A Case Study of Relation
Model for Expensive Optimization [6.382398222493027]
This paper presents a framework using unevaluated solutions to enhance the efficiency of SAEAs.
The surrogate model is employed to identify high-quality solutions for direct generation of new solutions without evaluation.
arXiv Detail & Related papers (2023-09-21T12:09:55Z) - SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step
Reasoning [55.76083560152823]
SelfCheck is a general-purpose zero-shot verification schema for recognizing errors in step-by-step reasoning.
We test SelfCheck on three datasets (GSM8K, MathQA, and MATH) and find that it successfully recognizes errors and, in turn, increases final answer accuracies.
arXiv Detail & Related papers (2023-08-01T10:31:36Z) - Towards Explainable Metaheuristic: Mining Surrogate Fitness Models for
Importance of Variables [69.02115180674885]
We use four benchmark problems to train a surrogate model and investigate the learning of the search space by the surrogate model.
We show that the surrogate model picks out key characteristics of the problem as it is trained on population data from each generation.
arXiv Detail & Related papers (2022-05-31T09:16:18Z) - A Mutual Information Maximization Approach for the Spurious Solution
Problem in Weakly Supervised Question Answering [60.768146126094955]
Weakly supervised question answering usually has only the final answers as supervision signals.
There may exist many spurious solutions that coincidentally derive the correct answer, but training on such solutions can hurt model performance.
We propose to explicitly exploit such semantic correlations by maximizing the mutual information between question-answer pairs and predicted solutions.
arXiv Detail & Related papers (2021-06-14T05:47:41Z) - Combining Deep Learning and Optimization for Security-Constrained
Optimal Power Flow [94.24763814458686]
Security-constrained optimal power flow (SCOPF) is fundamental in power systems.
Modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs.
This paper proposes a novel approach that combines deep learning and robust optimization techniques.
arXiv Detail & Related papers (2020-07-14T12:38:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.