Code-Optimise: Self-Generated Preference Data for Correctness and Efficiency
- URL: http://arxiv.org/abs/2406.12502v1
- Date: Tue, 18 Jun 2024 11:05:37 GMT
- Title: Code-Optimise: Self-Generated Preference Data for Correctness and Efficiency
- Authors: Leonidas Gee, Milan Gritta, Gerasimos Lampouras, Ignacio Iacobacci,
- Abstract summary: We introduce Code-Optimise, a framework that incorporates both correctness (passed, failed) and runtime as learning signals.
Our framework is both lightweight and robust as it dynamically selects solutions to reduce overfitting.
As a byproduct, the average length of the generated solutions is reduced by up to 48% on MBPP and 23% on HumanEval.
- Score: 15.593172556501704
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Code Language Models have been trained to generate accurate solutions, typically with no regard for runtime. On the other hand, previous works that explored execution optimisation have observed corresponding drops in functional correctness. To that end, we introduce Code-Optimise, a framework that incorporates both correctness (passed, failed) and runtime (quick, slow) as learning signals via self-generated preference data. Our framework is both lightweight and robust as it dynamically selects solutions to reduce overfitting while avoiding a reliance on larger models for learning signals. Code-Optimise achieves significant improvements in pass@k while decreasing the competitive baseline runtimes by an additional 6% for in-domain data and up to 3% for out-of-domain data. As a byproduct, the average length of the generated solutions is reduced by up to 48% on MBPP and 23% on HumanEval, resulting in faster and cheaper inference. The generated data and codebase will be open-sourced at www.open-source.link.
Related papers
- CodeDPO: Aligning Code Models with Self Generated and Verified Source Code [52.70310361822519]
We propose CodeDPO, a framework that integrates preference learning into code generation to improve two key code preference factors: code correctness and efficiency.
CodeDPO employs a novel dataset construction method, utilizing a self-generation-and-validation mechanism that simultaneously generates and evaluates code and test cases.
arXiv Detail & Related papers (2024-10-08T01:36:15Z) - Brevity is the soul of wit: Pruning long files for code generation [19.61423412870527]
We find that a simple--pruning long files--outperforms other methods in compute-limited regimes.
Our method can yield up to a 2x efficiency benefit in training (while matching performance) or a 3.5% absolute performance improvement on HumanEval.
arXiv Detail & Related papers (2024-06-29T13:08:24Z) - WPO: Enhancing RLHF with Weighted Preference Optimization [40.07940023654452]
Reinforcement learning from human feedback (RLHF) is a promising solution to align large language models (LLMs) more closely with human values.
Off-policy preference optimization often suffers from a distributional gap between the policy used for data collection and the target policy, leading to suboptimal optimization.
We propose a novel strategy to mitigate this problem by simulating on-policy learning with off-policy preference data.
arXiv Detail & Related papers (2024-06-17T17:59:13Z) - Uncertainty-Aware Testing-Time Optimization for 3D Human Pose Estimation [68.75387874066647]
We propose an Uncertainty-Aware testing-time optimization framework for 3D human pose estimation.
Our approach outperforms the previous best result by a large margin of 4.5% on Human3.6M.
arXiv Detail & Related papers (2024-02-04T04:28:02Z) - VeLO: Training Versatile Learned Optimizers by Scaling Up [67.90237498659397]
We leverage the same scaling approach behind the success of deep learning to learn versatiles.
We train an ingest for deep learning which is itself a small neural network that ingests and outputs parameter updates.
We open source our learned, meta-training code, the associated train test data, and an extensive benchmark suite with baselines at velo-code.io.
arXiv Detail & Related papers (2022-11-17T18:39:07Z) - Fast Optimization of Weighted Sparse Decision Trees for use in Optimal
Treatment Regimes and Optimal Policy Design [16.512942230284576]
We present three algorithms for efficient sparse weighted decision tree optimization.
The first approach directly optimize the weighted loss function; however, it tends to be computationally inefficient for large datasets.
Second approach, which scales more efficiently, transforms weights to integer values and uses data duplication to transform the weighted decision tree optimization problem into an unweighted (but larger) counterpart.
Third algorithm, which scales to much larger datasets, uses a randomized procedure that samples each data point with a probability proportional to its weight.
arXiv Detail & Related papers (2022-10-13T08:16:03Z) - Highly Parallel Autoregressive Entity Linking with Discriminative
Correction [51.947280241185]
We propose a very efficient approach that parallelizes autoregressive linking across all potential mentions.
Our model is >70 times faster and more accurate than the previous generative method.
arXiv Detail & Related papers (2021-09-08T17:28:26Z) - Learning to Optimize: A Primer and A Benchmark [94.29436694770953]
Learning to optimize (L2O) is an emerging approach that leverages machine learning to develop optimization methods.
This article is poised to be the first comprehensive survey and benchmark of L2O for continuous optimization.
arXiv Detail & Related papers (2021-03-23T20:46:20Z) - The Right Tool for the Job: Matching Model and Instance Complexities [62.95183777679024]
As NLP models become larger, executing a trained model requires significant computational resources incurring monetary and environmental costs.
We propose a modification to contextual representation fine-tuning which, during inference, allows for an early (and fast) "exit"
We test our proposed modification on five different datasets in two tasks: three text classification datasets and two natural language inference benchmarks.
arXiv Detail & Related papers (2020-04-16T04:28:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.