Related papers: Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models

Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models

URL: http://arxiv.org/abs/2312.06585v4
Date: Thu, 18 Apr 2024 03:12:09 GMT
Title: Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models
Authors: Avi Singh, John D. Co-Reyes, Rishabh Agarwal, Ankesh Anand, Piyush Patil, Xavier Garcia, Peter J. Liu, James Harrison, Jaehoon Lee, Kelvin Xu, Aaron Parisi, Abhishek Kumar, Alex Alemi, Alex Rizkowsky, Azade Nova, Ben Adlam, Bernd Bohnet, Gamaleldin Elsayed, Hanie Sedghi, Igor Mordatch, Isabelle Simpson, Izzeddin Gur, Jasper Snoek, Jeffrey Pennington, Jiri Hron, Kathleen Kenealy, Kevin Swersky, Kshiteej Mahajan, Laura Culp, Lechao Xiao, Maxwell L. Bileschi, Noah Constant, Roman Novak, Rosanne Liu, Tris Warkentin, Yundi Qian, Yamini Bansal, Ethan Dyer, Behnam Neyshabur, Jascha Sohl-Dickstein, Noah Fiedel,
Abstract summary: Fine-tuning language models(LMs) on human-generated data remains a prevalent practice. We investigate whether we can go beyond human data on tasks where we have access to scalar feedback. We find that ReST$EM$ scales favorably with model size and significantly surpasses fine-tuning only on human data.
Score: 115.501751261878
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Fine-tuning language models~(LMs) on human-generated data remains a prevalent practice. However, the performance of such models is often limited by the quantity and diversity of high-quality human data. In this paper, we explore whether we can go beyond human data on tasks where we have access to scalar feedback, for example, on math problems where one can verify correctness. To do so, we investigate a simple self-training method based on expectation-maximization, which we call ReST$^{EM}$, where we (1) generate samples from the model and filter them using binary feedback, (2) fine-tune the model on these samples, and (3) repeat this process a few times. Testing on advanced MATH reasoning and APPS coding benchmarks using PaLM-2 models, we find that ReST$^{EM}$ scales favorably with model size and significantly surpasses fine-tuning only on human data. Overall, our findings suggest self-training with feedback can substantially reduce dependence on human-generated data.

Related papers

Evolution without Large Models: Training Language Model with Task Principles [52.44569608690695]
A common training approach for language models involves using a large-scale language model to expand a human-provided dataset.<n>This method significantly reduces training costs by eliminating the need for extensive human data annotation.<n>However, it still faces challenges such as high carbon emissions during data augmentation and the risk of data leakage.
arXiv Detail & Related papers (2025-07-08T13:52:45Z)
Human Guided Learning of Transparent Regression Models [4.592493651895646]
We present a human-in-the-loop (HIL) approach to permutation regression. The model is a gradient boosted regression model that incorporates simple human-understandable constraints. The approach, HuGuR, lets a human explore the search space of such transparent regression models.
arXiv Detail & Related papers (2025-02-21T23:15:12Z)
Machine-generated text detection prevents language model collapse [17.34282527020344]
We investigate the impact of decoding strategy on model collapse.<n>We train a machine-generated text detector and propose an importance sampling approach to alleviate model collapse.
arXiv Detail & Related papers (2025-02-21T18:22:36Z)
OptMATH: A Scalable Bidirectional Data Synthesis Framework for Optimization Modeling [9.617742955894247]
Lack of high-quality optimization modeling datasets hampers large language models. We propose a scalable framework for synthesizing a high-quality dataset, named OptMATH. We demonstrate that models of various sizes trained on OptMATH achieve superior results on multiple modeling benchmarks.
arXiv Detail & Related papers (2025-02-16T12:38:37Z)
Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through Failure-Inducing Exploration [90.41908331897639]
Large language models (LLMs) have significantly benefited from training on diverse, high-quality task-specific data. We present a novel approach, ReverseGen, designed to automatically generate effective training samples.
arXiv Detail & Related papers (2024-10-22T06:43:28Z)
Data Diet: Can Trimming PET/CT Datasets Enhance Lesion Segmentation? [70.38903555729081]
We describe our approach to compete in the autoPET3 datacentric track. We find that in the autoPETIII dataset, a model that is trained on the entire dataset exhibits undesirable characteristics. We counteract this by removing the easiest samples from the training dataset as measured by the model loss before retraining from scratch.
arXiv Detail & Related papers (2024-09-20T14:47:58Z)
Self-Consuming Generative Models with Curated Data Provably Optimize Human Preferences [20.629333587044012]
We study the impact of data curation on iterated retraining of generative models. We prove that, if the data is curated according to a reward model, the expected reward of the iterative retraining procedure is maximized.
arXiv Detail & Related papers (2024-06-12T21:28:28Z)
Learning Defect Prediction from Unrealistic Data [57.53586547895278]
Pretrained models of code have become popular choices for code understanding and generation tasks. Such models tend to be large and require commensurate volumes of training data. It has become popular to train models with far larger but less realistic datasets, such as functions with artificially injected bugs. Models trained on such data tend to only perform well on similar data, while underperforming on real world programs.
arXiv Detail & Related papers (2023-11-02T01:51:43Z)
Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models. This creates a barrier to fusing knowledge across individual models to yield a better single model. We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z)
Complementary Ensemble Learning [1.90365714903665]
We derive a technique to improve performance of state-of-the-art deep learning models. Specifically, we train auxiliary models which are able to complement state-of-the-art model uncertainty.
arXiv Detail & Related papers (2021-11-09T03:23:05Z)
Probabilistic Modeling for Human Mesh Recovery [73.11532990173441]
This paper focuses on the problem of 3D human reconstruction from 2D evidence. We recast the problem as learning a mapping from the input to a distribution of plausible 3D poses.
arXiv Detail & Related papers (2021-08-26T17:55:11Z)
Improving Question Answering Model Robustness with Synthetic Adversarial Data Generation [41.9785159975426]
State-of-the-art question answering models remain susceptible to a variety of adversarial attacks and are still far from obtaining human-level language understanding. One proposed way forward is dynamic adversarial data collection, in which a human annotator attempts to create examples for which a model-in-the-loop fails. In this work, we investigate several answer selection, question generation, and filtering methods that form a synthetic adversarial data generation pipeline. Models trained on both synthetic and human-generated data outperform models not trained on synthetic adversarial data, and obtain state-of-the-art results on the Adversarial
arXiv Detail & Related papers (2021-04-18T02:00:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.