Related papers: V-STaR: Training Verifiers for Self-Taught Reasoners

V-STaR: Training Verifiers for Self-Taught Reasoners

URL: http://arxiv.org/abs/2402.06457v1
Date: Fri, 9 Feb 2024 15:02:56 GMT
Title: V-STaR: Training Verifiers for Self-Taught Reasoners
Authors: Arian Hosseini, Xingdi Yuan, Nikolay Malkin, Aaron Courville, Alessandro Sordoni and Rishabh Agarwal
Abstract summary: We propose V-STaR that utilizes both the correct and incorrect solutions generated during the self-improvement process to train a verifier. V-STaR delivers a 4% to 17% test accuracy improvement over existing self-improvement and verification approaches.
Score: 75.11811592995176
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Common self-improvement approaches for large language models (LLMs), such as STaR (Zelikman et al., 2022), iteratively fine-tune LLMs on self-generated solutions to improve their problem-solving ability. However, these approaches discard the large amounts of incorrect solutions generated during this process, potentially neglecting valuable information in such solutions. To address this shortcoming, we propose V-STaR that utilizes both the correct and incorrect solutions generated during the self-improvement process to train a verifier using DPO that judges correctness of model-generated solutions. This verifier is used at inference time to select one solution among many candidate solutions. Running V-STaR for multiple iterations results in progressively better reasoners and verifiers, delivering a 4% to 17% test accuracy improvement over existing self-improvement and verification approaches on common code generation and math reasoning benchmarks with LLaMA2 models.

Related papers

Small Language Models Need Strong Verifiers to Self-Correct Reasoning [69.94251699982388]
Self-correction has emerged as a promising solution to boost the reasoning performance of large language models (LLMs) This work explores whether small (= 13B) language models (LMs) have the ability of self-correction on reasoning tasks with minimal inputs from stronger LMs.
arXiv Detail & Related papers (2024-04-26T03:41:28Z)
Enhancing Large Language Model Performance To Answer Questions and Extract Information More Accurately [2.1715455600756646]
Large Language Models (LLMs) generate responses to questions. Their effectiveness is often hindered by sub-optimal quality of answers and occasional failures to provide accurate responses to questions. To address these challenges, a fine-tuning process is employed, involving feedback and examples to refine models.
arXiv Detail & Related papers (2024-01-27T00:18:07Z)
Improving Large Language Model Fine-tuning for Solving Math Problems [20.417053742869403]
A large gap exists between large language models' pass-at-one and pass-at-N performance in solving math problems. Using the challenging MATH dataset, we investigate three fine-tuning strategies. We design a fine-tuning recipe that yields approximately 58.8% accuracy on the MATH dataset with fine-tuned PaLM 2-L models.
arXiv Detail & Related papers (2023-10-16T04:11:19Z)
Enhancing SAEAs with Unevaluated Solutions: A Case Study of Relation Model for Expensive Optimization [6.382398222493027]
This paper presents a framework using unevaluated solutions to enhance the efficiency of SAEAs. The surrogate model is employed to identify high-quality solutions for direct generation of new solutions without evaluation.
arXiv Detail & Related papers (2023-09-21T12:09:55Z)
SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning [55.76083560152823]
SelfCheck is a general-purpose zero-shot verification schema for recognizing errors in step-by-step reasoning. We test SelfCheck on three datasets (GSM8K, MathQA, and MATH) and find that it successfully recognizes errors and, in turn, increases final answer accuracies.
arXiv Detail & Related papers (2023-08-01T10:31:36Z)
Towards Explainable Metaheuristic: Mining Surrogate Fitness Models for Importance of Variables [69.02115180674885]
We use four benchmark problems to train a surrogate model and investigate the learning of the search space by the surrogate model. We show that the surrogate model picks out key characteristics of the problem as it is trained on population data from each generation.
arXiv Detail & Related papers (2022-05-31T09:16:18Z)
A Mutual Information Maximization Approach for the Spurious Solution Problem in Weakly Supervised Question Answering [60.768146126094955]
Weakly supervised question answering usually has only the final answers as supervision signals. There may exist many spurious solutions that coincidentally derive the correct answer, but training on such solutions can hurt model performance. We propose to explicitly exploit such semantic correlations by maximizing the mutual information between question-answer pairs and predicted solutions.
arXiv Detail & Related papers (2021-06-14T05:47:41Z)
Combining Deep Learning and Optimization for Security-Constrained Optimal Power Flow [94.24763814458686]
Security-constrained optimal power flow (SCOPF) is fundamental in power systems. Modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs. This paper proposes a novel approach that combines deep learning and robust optimization techniques.
arXiv Detail & Related papers (2020-07-14T12:38:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.