Related papers: From Bias To Improved Prompts: A Case Study of Bias Mitigation of Clone Detection Models

From Bias To Improved Prompts: A Case Study of Bias Mitigation of Clone Detection Models

URL: http://arxiv.org/abs/2505.05679v1
Date: Thu, 08 May 2025 22:38:10 GMT
Title: From Bias To Improved Prompts: A Case Study of Bias Mitigation of Clone Detection Models
Authors: QiHong Chen, Lianghao Jiang, Iftekhar Ahmed,
Abstract summary: We assess the suitability of Generative Large Language Models for clone code detection.<n>A known issue with LLMs is their susceptibility to prompt bias, where the performance of these models fluctuates based on the input prompt provided.<n>Our analysis identifies eight distinct categories of prompt bias, and our devised approach leveraging these biases yields a significant improvement of up to 10.81% in the F1 score.
Score: 5.874997638802244
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The issue of clone code has persisted in software engineering, primarily because developers often copy and paste code segments. This common practice has elevated the importance of clone code detection, garnering attention from both software engineering researchers and industry professionals. Their collective concern arises from the potential negative impacts that clone code can have on software quality. The emergence of powerful Generative Large Language Models (LLMs) like ChatGPT has exacerbated the clone code problem. These advanced models possess code generation capabilities that can inadvertently create code clones. As a result, the need to detect clone code has become more critical than ever before. In this study, we assess the suitability of LLMs for clone code detection. Our results demonstrate that the Palm model achieved a high F1 score of 89.30 for the avatar dataset and 86.41 for the poolC dataset. A known issue with LLMs is their susceptibility to prompt bias, where the performance of these models fluctuates based on the input prompt provided. In our research, we delve deeper into the reasons behind these fluctuations and propose a framework to mitigate prompt bias for clone detection. Our analysis identifies eight distinct categories of prompt bias, and our devised approach leveraging these biases yields a significant improvement of up to 10.81% in the F1 score. These findings underscore the substantial impact of prompt bias on the performance of LLMs and highlight the potential for leveraging model errors to alleviate this bias.

Related papers

DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation [68.19756761027351]
Diffusion large language models (dLLMs) are compelling alternatives to autoregressive (AR) models.<n>We investigate their denoising processes and reinforcement learning methods.<n>Our work provides deeper insight into the machinery of dLLM generation and offers an effective, diffusion-native RL training framework.
arXiv Detail & Related papers (2025-06-25T17:35:47Z)
Learning to Solve and Verify: A Self-Play Framework for Code and Test Generation [69.62857948698436]
Recent advances in large language models (LLMs) have improved their performance on coding benchmarks.<n>However, improvement is plateauing due to the exhaustion of readily available high-quality data.<n>We propose Sol-Ver, a self-play solver-verifier framework that jointly improves a single model's code and test generation capacity.
arXiv Detail & Related papers (2025-02-20T18:32:19Z)
Focused-DPO: Enhancing Code Generation Through Focused Preference Optimization on Error-Prone Points [51.40935517552926]
We introduce Focused-DPO, a framework that enhances code generation by directing preference optimization towards critical error-prone areas.<n>By focusing on error-prone points, Focused-DPO advances the accuracy and functionality of model-generated code.
arXiv Detail & Related papers (2025-02-17T06:16:02Z)
$\mathbb{USCD}$: Improving Code Generation of LLMs by Uncertainty-Aware Selective Contrastive Decoding [64.00025564372095]
Large language models (LLMs) have shown remarkable capabilities in code generation. The effects of hallucinations (e.g., output noise) make it challenging for LLMs to generate high-quality code in one pass. We propose a simple and effective textbfuncertainty-aware textbfselective textbfcontrastive textbfdecoding.
arXiv Detail & Related papers (2024-09-09T02:07:41Z)
Assessing the Code Clone Detection Capability of Large Language Models [0.0]
The evaluation involves testing the models on a variety of code pairs of different clone types and levels of similarity. Findings indicate that GPT-4 consistently surpasses GPT-3.5 across all clone types.
arXiv Detail & Related papers (2024-07-02T16:20:44Z)
Validating LLM-Generated Programs with Metamorphic Prompt Testing [8.785973653167112]
Large Language Models (LLMs) are increasingly integrated into the software development lifecycle. This paper proposes a novel solution called metamorphic prompt testing to address these challenges. Our evaluation on HumanEval shows that metamorphic prompt testing is able to detect 75 percent of the erroneous programs generated by GPT-4, with a false positive rate of 8.6 percent.
arXiv Detail & Related papers (2024-06-11T00:40:17Z)
Does Your Neural Code Completion Model Use My Code? A Membership Inference Approach [66.51005288743153]
We investigate the legal and ethical issues of current neural code completion models. We tailor a membership inference approach (termed CodeMI) that was originally crafted for classification tasks. We evaluate the effectiveness of this adapted approach across a diverse array of neural code completion models.
arXiv Detail & Related papers (2024-04-22T15:54:53Z)
Bias Testing and Mitigation in LLM-based Code Generation [27.997232692723767]
This paper presents a novel bias testing framework specifically designed for code generation tasks.<n>We conduct an empirical study on the biases in code generated by five widely studied LLMs.<n>We study five bias mitigation prompt strategies that are commonly used in current code generation scenarios.
arXiv Detail & Related papers (2023-09-03T07:14:49Z)
Towards Understanding the Capability of Large Language Models on Code Clone Detection: A Survey [40.99060616674878]
Large language models (LLMs) possess diverse code-related knowledge, making them versatile for various software engineering challenges. This paper provides the first comprehensive evaluation of LLMs for clone detection, covering different clone types, languages, and prompts. We find advanced LLMs excel in detecting complex semantic clones, surpassing existing methods.
arXiv Detail & Related papers (2023-08-02T14:56:01Z)
CONCORD: Clone-aware Contrastive Learning for Source Code [64.51161487524436]
Self-supervised pre-training has gained traction for learning generic code representations valuable for many downstream SE tasks. We argue that it is also essential to factor in how developers code day-to-day for general-purpose representation learning. In particular, we propose CONCORD, a self-supervised, contrastive learning strategy to place benign clones closer in the representation space while moving deviants further apart.
arXiv Detail & Related papers (2023-06-05T20:39:08Z)
Evaluation of Contrastive Learning with Various Code Representations for Code Clone Detection [3.699097874146491]
We evaluate contrastive learning for detecting semantic clones of code snippets. We use CodeTransformator to create a dataset that mimics plagiarised code based on competitive programming solutions. The results of our evaluation show that proposed models perform diversely in each task, however the performance of the graph-based models is generally above the others.
arXiv Detail & Related papers (2022-06-17T12:25:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.