Related papers: How Diversely Can Language Models Solve Problems? Exploring the Algorithmic Diversity of Model-Generated Code

How Diversely Can Language Models Solve Problems? Exploring the Algorithmic Diversity of Model-Generated Code

URL: http://arxiv.org/abs/2503.00691v2
Date: Fri, 07 Mar 2025 05:38:47 GMT
Title: How Diversely Can Language Models Solve Problems? Exploring the Algorithmic Diversity of Model-Generated Code
Authors: Seonghyeon Lee, Heejae Chon, Joonwon Jang, Dongha Lee, Hwanjo Yu,
Abstract summary: Language models (LMs) have exhibited impressive abilities in generating code from natural language requirements.<n>We highlight the diversity of code generated by LMs as a critical criterion for evaluating their code generation capabilities.
Score: 26.321703238736813
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Language models (LMs) have exhibited impressive abilities in generating code from natural language requirements. In this work, we highlight the diversity of code generated by LMs as a critical criterion for evaluating their code generation capabilities. There is a lack of studies focused on assessing the diversity of generated code, which overlooks its importance in code LMs. Therefore, we propose a systematic approach to evaluate code diversity, introducing various metrics with inter-code similarity. Specifically, we introduce code clustering methods that leverages LMs' capabilities in code understanding and reasoning, resulting in a set of metrics that represent the number of algorithms in model-generated solutions. We extensively investigate the property of model-generated solutions by contrasting them with human-written ones and quantifying the impact of various factors on code diversity: model size, temperature, instruction tuning, and problem complexity. Our analysis demonstrates that model-generated solutions exhibit low algorithmic diversity, which was neglected by the research community. Moreover, we explore methods to increase code diversity by combining solutions from different models and increasing sampling temperatures. Our findings highlight that code diversity can be enhanced with the help of heterogeneous models and setting temperature beyond 1.0 that has not been fully explored due to the functional correctness degradation. To facilitate our research direction, we publicly share our code and datasets through open-source repositories.

Related papers

SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models [54.78329741186446]
We propose a novel paradigm that uses a code-based critic model to guide steps including question-code data construction, quality control, and complementary evaluation. Experiments across both in-domain and out-of-domain benchmarks in English and Chinese demonstrate the effectiveness of the proposed paradigm.
arXiv Detail & Related papers (2024-08-28T06:33:03Z)
Is Functional Correctness Enough to Evaluate Code Language Models? Exploring Diversity of Generated Codes [17.95094238686012]
Language models (LMs) have exhibited impressive abilities in generating codes from natural language requirements. We highlight the diversity of code generated by LMs as a critical criterion for evaluating their code generation capabilities. We propose a systematic approach to evaluate the diversity of generated code, utilizing various metrics for inter-code similarity as well as functional correctness.
arXiv Detail & Related papers (2024-08-24T07:40:22Z)
What's Wrong with Your Code Generated by Large Language Models? An Extensive Study [80.18342600996601]
Large language models (LLMs) produce code that is shorter yet more complicated as compared to canonical solutions. We develop a taxonomy of bugs for incorrect codes that includes three categories and 12 sub-categories, and analyze the root cause for common bug types. We propose a novel training-free iterative method that introduces self-critique, enabling LLMs to critique and correct their generated code based on bug types and compiler feedback.
arXiv Detail & Related papers (2024-07-08T17:27:17Z)
AlchemistCoder: Harmonizing and Eliciting Code Capability by Hindsight Tuning on Multi-source Data [64.69872638349922]
We present AlchemistCoder, a series of Code LLMs with enhanced code generation and generalization capabilities fine-tuned on multi-source data. We propose incorporating the data construction process into the fine-tuning data as code comprehension tasks, including instruction evolution, data filtering, and code review.
arXiv Detail & Related papers (2024-05-29T16:57:33Z)
Improving Diversity of Commonsense Generation by Large Language Models via In-Context Learning [28.654890118684957]
Generative Commonsense Reasoning (GCR) requires a model to reason about a situation using commonsense knowledge. The diversity of the generation is equally important because it reflects the model's ability to use a range of commonsense knowledge facts. We propose a simple method that diversifies the LLM generations, while preserving their quality.
arXiv Detail & Related papers (2024-04-25T17:52:39Z)
Creative and Correct: Requesting Diverse Code Solutions from AI Foundation Models [8.40868688916685]
In software engineering tasks, diversity is key to exploring design spaces and fostering creativity. Our study systematically investigates this trade-off using experiments with HumanEval tasks. We identify combinations of parameters and strategies that strike an optimal balance between diversity and correctness.
arXiv Detail & Related papers (2024-03-20T02:51:46Z)
A Thorough Examination of Decoding Methods in the Era of LLMs [72.65956436513241]
Decoding methods play an indispensable role in converting language models from next-token predictors into practical task solvers. This paper provides a comprehensive and multifaceted analysis of various decoding methods within the context of large language models. Our findings reveal that decoding method performance is notably task-dependent and influenced by factors such as alignment, model size, and quantization.
arXiv Detail & Related papers (2024-02-10T11:14:53Z)
ACES: Generating Diverse Programming Puzzles with with Autotelic Generative Models [20.039580079339537]
Autotelic CodE Search (ACES) jointly optimize for the diversity and difficulty of generated problems. We represent problems in a space of semantic descriptors describing the programming skills required to solve them. ACES iteratively prompts a large language model to generate difficult problems achieving a diversity of target semantic descriptors.
arXiv Detail & Related papers (2023-10-15T14:57:14Z)
MinT: Boosting Generalization in Mathematical Reasoning via Multi-View Fine-Tuning [53.90744622542961]
Reasoning in mathematical domains remains a significant challenge for small language models (LMs) We introduce a new method that exploits existing mathematical problem datasets with diverse annotation styles. Experimental results show that our strategy enables a LLaMA-7B model to outperform prior approaches.
arXiv Detail & Related papers (2023-07-16T05:41:53Z)
VOLTA: Improving Generative Diversity by Variational Mutual Information Maximizing Autoencoder [38.35049378875308]
We introduce VOLTA, a framework that elevates generative diversity by bridging Transformer with VAE. We perform comprehensive experiments with two types of Transformers on six datasets to show that our approach can significantly improve generative diversity while maintaining generative quality.
arXiv Detail & Related papers (2023-07-03T08:45:42Z)
Expressivity of Parameterized and Data-driven Representations in Quality Diversity Search [111.06379262544911]
We compare the output diversity of a quality diversity evolutionary search performed in two different search spaces. A learned model is better at interpolating between known data points than at extrapolating or expanding towards unseen examples.
arXiv Detail & Related papers (2021-05-10T10:27:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.