Code-Optimise: Self-Generated Preference Data for Correctness and Efficiency
- URL: http://arxiv.org/abs/2406.12502v2
- Date: Wed, 05 Feb 2025 12:29:01 GMT
- Title: Code-Optimise: Self-Generated Preference Data for Correctness and Efficiency
- Authors: Leonidas Gee, Milan Gritta, Gerasimos Lampouras, Ignacio Iacobacci,
- Abstract summary: We introduce Code-Optimise, a framework that incorporates both correctness (passed, failed) and runtime as learning signals.<n>Our framework is both lightweight and robust as it dynamically selects solutions to reduce overfitting.<n>As a by-product, the average length of the generated solutions is reduced by up to 48% on MBPP and 23% on HumanEval.
- Score: 15.593172556501704
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Code Language Models have been trained to generate accurate solutions, typically with no regard for runtime. On the other hand, previous works that explored execution optimisation have observed corresponding drops in functional correctness. To that end, we introduce Code-Optimise, a framework that incorporates both correctness (passed, failed) and runtime (quick, slow) as learning signals via self-generated preference data. Our framework is both lightweight and robust as it dynamically selects solutions to reduce overfitting while avoiding a reliance on larger models for learning signals. Code-Optimise achieves significant improvements in pass@k while decreasing the competitive baseline runtimes by an additional 6% for in-domain data and up to 3% for out-of-domain data. As a by-product, the average length of the generated solutions is reduced by up to 48% on MBPP and 23% on HumanEval, resulting in faster and cheaper inference. The generated data and codebase is open-sourced at https://github.com/huawei-noah/HEBO/tree/Code_Optimise.
Related papers
- Learning to Solve and Verify: A Self-Play Framework for Code and Test Generation [69.62857948698436]
Recent advances in large language models (LLMs) have improved their performance on coding benchmarks.
However, improvement is plateauing due to the exhaustion of readily available high-quality data.
We propose Sol-Ver, a self-play solver-verifier framework that jointly improves a single model's code and test generation capacity.
arXiv Detail & Related papers (2025-02-20T18:32:19Z) - Thinking Before Running! Efficient Code Generation with Thorough Exploration and Optimal Refinement [35.991531332335654]
We introduce ThinkCoder, a framework that combines thorough exploration with optimal refinement.
The exploration phase diversifies the solution space by searching for potential solutions, followed by a refinement phase that enhances precision.
This approach allows us to select the best solution through careful consideration before taking action, avoiding excessive trial and error.
arXiv Detail & Related papers (2024-12-30T07:02:15Z) - CodeLutra: Boosting LLM Code Generation via Preference-Guided Refinement [32.46078765471136]
Large Language Models (LLMs) have revolutionized code generation but require significant resources and often over-generalize.
We introduce CodeLutra, a framework that leverages both correct and incorrect code attempts.
By learning from both successes and mistakes, CodeLutra provides a scalable and efficient path to high-quality code generation.
arXiv Detail & Related papers (2024-11-07T21:51:07Z) - SwiftCoder: Enhancing Code Generation in Large Language Models through Efficiency-Aware Fine-tuning [17.355845751737423]
Current methods primarily focus on correctness, often overlooking efficiency.
dataset offers a scalable and effective solution for advancing AI-driven code generation.
arXiv Detail & Related papers (2024-10-14T07:05:51Z) - CodeDPO: Aligning Code Models with Self Generated and Verified Source Code [52.70310361822519]
We propose CodeDPO, a framework that integrates preference learning into code generation to improve two key code preference factors: code correctness and efficiency.
CodeDPO employs a novel dataset construction method, utilizing a self-generation-and-validation mechanism that simultaneously generates and evaluates code and test cases.
arXiv Detail & Related papers (2024-10-08T01:36:15Z) - Brevity is the soul of wit: Pruning long files for code generation [19.61423412870527]
We find that a simple--pruning long files--outperforms other methods in compute-limited regimes.
Our method can yield up to a 2x efficiency benefit in training (while matching performance) or a 3.5% absolute performance improvement on HumanEval.
arXiv Detail & Related papers (2024-06-29T13:08:24Z) - WPO: Enhancing RLHF with Weighted Preference Optimization [40.07940023654452]
Reinforcement learning from human feedback (RLHF) is a promising solution to align large language models (LLMs) more closely with human values.
Off-policy preference optimization often suffers from a distributional gap between the policy used for data collection and the target policy, leading to suboptimal optimization.
We propose a novel strategy to mitigate this problem by simulating on-policy learning with off-policy preference data.
arXiv Detail & Related papers (2024-06-17T17:59:13Z) - Uncertainty-Aware Testing-Time Optimization for 3D Human Pose Estimation [68.75387874066647]
We propose an Uncertainty-Aware testing-time optimization framework for 3D human pose estimation.
Our approach outperforms the previous best result by a large margin of 4.5% on Human3.6M.
arXiv Detail & Related papers (2024-02-04T04:28:02Z) - VeLO: Training Versatile Learned Optimizers by Scaling Up [67.90237498659397]
We leverage the same scaling approach behind the success of deep learning to learn versatiles.
We train an ingest for deep learning which is itself a small neural network that ingests and outputs parameter updates.
We open source our learned, meta-training code, the associated train test data, and an extensive benchmark suite with baselines at velo-code.io.
arXiv Detail & Related papers (2022-11-17T18:39:07Z) - Fast Optimization of Weighted Sparse Decision Trees for use in Optimal
Treatment Regimes and Optimal Policy Design [16.512942230284576]
We present three algorithms for efficient sparse weighted decision tree optimization.
The first approach directly optimize the weighted loss function; however, it tends to be computationally inefficient for large datasets.
Second approach, which scales more efficiently, transforms weights to integer values and uses data duplication to transform the weighted decision tree optimization problem into an unweighted (but larger) counterpart.
Third algorithm, which scales to much larger datasets, uses a randomized procedure that samples each data point with a probability proportional to its weight.
arXiv Detail & Related papers (2022-10-13T08:16:03Z) - Highly Parallel Autoregressive Entity Linking with Discriminative
Correction [51.947280241185]
We propose a very efficient approach that parallelizes autoregressive linking across all potential mentions.
Our model is >70 times faster and more accurate than the previous generative method.
arXiv Detail & Related papers (2021-09-08T17:28:26Z) - Learning to Optimize: A Primer and A Benchmark [94.29436694770953]
Learning to optimize (L2O) is an emerging approach that leverages machine learning to develop optimization methods.
This article is poised to be the first comprehensive survey and benchmark of L2O for continuous optimization.
arXiv Detail & Related papers (2021-03-23T20:46:20Z) - The Right Tool for the Job: Matching Model and Instance Complexities [62.95183777679024]
As NLP models become larger, executing a trained model requires significant computational resources incurring monetary and environmental costs.
We propose a modification to contextual representation fine-tuning which, during inference, allows for an early (and fast) "exit"
We test our proposed modification on five different datasets in two tasks: three text classification datasets and two natural language inference benchmarks.
arXiv Detail & Related papers (2020-04-16T04:28:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.