PPM: Automated Generation of Diverse Programming Problems for
Benchmarking Code Generation Models
- URL: http://arxiv.org/abs/2401.15545v1
- Date: Sun, 28 Jan 2024 02:27:38 GMT
- Title: PPM: Automated Generation of Diverse Programming Problems for
Benchmarking Code Generation Models
- Authors: Simin Chen, Xiaoning Feng, Xiaohong Han, Cong Liu, Wei Yang
- Abstract summary: We propose the idea of programming problem merging (PPM) and provide two implementation of this idea, we utilize our tool on two widely-used datasets.
The results demonstrate the effectiveness of our tool in generating more challenging, diverse, and natural programming problems.
- Score: 10.491051578439722
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent times, a plethora of Large Code Generation Models (LCGMs) have been
proposed, showcasing significant potential in assisting developers with complex
programming tasks. Benchmarking LCGMs necessitates the creation of a set of
diverse programming problems, and each problem comprises the prompt (including
the task description), canonical solution, and test inputs. The existing
methods for constructing such a problem set can be categorized into two main
types: manual methods and perturbation-based methods. However, manual methods
demand high effort and lack scalability, while also risking data integrity due
to LCGMs' potentially contaminated data collection, and perturbation-based
approaches mainly generate semantically homogeneous problems with the same
canonical solutions and introduce typos that can be easily auto-corrected by
IDE, making them ineffective and unrealistic. In this work, we propose the idea
of programming problem merging (PPM) and provide two implementation of this
idea, we utilize our tool on two widely-used datasets and compare it against
nine baseline methods using eight code generation models. The results
demonstrate the effectiveness of our tool in generating more challenging,
diverse, and natural programming problems, comparing to the baselines.
Related papers
- An Empirical Study on Self-correcting Large Language Models for Data Science Code Generation [1.335664823620186]
Large Language Models (LLMs) have recently advanced many applications on software engineering tasks.
CoT-SelfEvolve iteratively and automatically refines code through a self-correcting process.
arXiv Detail & Related papers (2024-08-28T09:19:09Z) - SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models [54.78329741186446]
We propose a novel paradigm that uses a code-based critic model to guide steps including question-code data construction, quality control, and complementary evaluation.
Experiments across both in-domain and out-of-domain benchmarks in English and Chinese demonstrate the effectiveness of the proposed paradigm.
arXiv Detail & Related papers (2024-08-28T06:33:03Z) - Estimating Difficulty Levels of Programming Problems with Pre-trained Model [18.92661958433282]
The difficulty level of each programming problem serves as an essential reference for guiding students' adaptive learning.
We formulate the problem of automatic difficulty level estimation of each programming problem, given its textual description and a solution example of code.
For tackling this problem, we propose to couple two pre-trained models, one for text modality and the other for code modality, into a unified model.
arXiv Detail & Related papers (2024-06-13T05:38:20Z) - MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time [51.5039731721706]
MindStar is a purely inference-based searching method for large language models.
It formulates reasoning tasks as searching problems and proposes two search ideas to identify the optimal reasoning paths.
It significantly enhances the reasoning abilities of open-source models, such as Llama-2-13B and Mistral-7B, and achieves comparable performance to GPT-3.5 and Grok-1.
arXiv Detail & Related papers (2024-05-25T15:07:33Z) - Mixture-of-Instructions: Comprehensive Alignment of a Large Language Model through the Mixture of Diverse System Prompting Instructions [7.103987978402038]
We introduce a novel technique termed Mixture-of-Instructions (MoI)
MoI employs a strategy of instruction concatenation combined with diverse system prompts to boost the alignment efficiency of language models.
Our methodology was applied to the open-source Qwen-7B-chat model, culminating in the development of Qwen-SFT-MoI.
arXiv Detail & Related papers (2024-04-29T03:58:12Z) - A Thorough Examination of Decoding Methods in the Era of LLMs [72.65956436513241]
Decoding methods play an indispensable role in converting language models from next-token predictors into practical task solvers.
This paper provides a comprehensive and multifaceted analysis of various decoding methods within the context of large language models.
Our findings reveal that decoding method performance is notably task-dependent and influenced by factors such as alignment, model size, and quantization.
arXiv Detail & Related papers (2024-02-10T11:14:53Z) - Large Language Models as Analogical Reasoners [155.9617224350088]
Chain-of-thought (CoT) prompting for language models demonstrates impressive performance across reasoning tasks.
We introduce a new prompting approach, analogical prompting, designed to automatically guide the reasoning process of large language models.
arXiv Detail & Related papers (2023-10-03T00:57:26Z) - Fully Autonomous Programming with Large Language Models [0.9558392439655015]
Current approaches to program synthesis with Large Language Models (LLMs) exhibit a "near miss syndrome"
We use OpenAI Codex as the LLM and Program Synthesis Benchmark 2 as a database of problem descriptions and tests for evaluation.
The resulting framework outperforms both conventional usage of Codex without the repair phase and traditional genetic programming approaches.
arXiv Detail & Related papers (2023-04-20T16:12:05Z) - A Two-stage Framework and Reinforcement Learning-based Optimization
Algorithms for Complex Scheduling Problems [54.61091936472494]
We develop a two-stage framework, in which reinforcement learning (RL) and traditional operations research (OR) algorithms are combined together.
The scheduling problem is solved in two stages, including a finite Markov decision process (MDP) and a mixed-integer programming process, respectively.
Results show that the proposed algorithms could stably and efficiently obtain satisfactory scheduling schemes for agile Earth observation satellite scheduling problems.
arXiv Detail & Related papers (2021-03-10T03:16:12Z) - An Online Method for A Class of Distributionally Robust Optimization
with Non-Convex Objectives [54.29001037565384]
We propose a practical online method for solving a class of online distributionally robust optimization (DRO) problems.
Our studies demonstrate important applications in machine learning for improving the robustness of networks.
arXiv Detail & Related papers (2020-06-17T20:19:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.