Pitfalls in Language Models for Code Intelligence: A Taxonomy and Survey
- URL: http://arxiv.org/abs/2310.17903v1
- Date: Fri, 27 Oct 2023 05:32:57 GMT
- Title: Pitfalls in Language Models for Code Intelligence: A Taxonomy and Survey
- Authors: Xinyu She, Yue Liu, Yanjie Zhao, Yiling He, Li Li, Chakkrit
Tantithamthavorn, Zhan Qin, Haoyu Wang
- Abstract summary: Modern language models (LMs) have been successfully employed in source code generation and understanding.
Despite their great potential, language models for code intelligence (LM4Code) are susceptible to potential pitfalls.
- Score: 21.01561950216472
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern language models (LMs) have been successfully employed in source code
generation and understanding, leading to a significant increase in research
focused on learning-based code intelligence, such as automated bug repair, and
test case generation. Despite their great potential, language models for code
intelligence (LM4Code) are susceptible to potential pitfalls, which hinder
realistic performance and further impact their reliability and applicability in
real-world deployment. Such challenges drive the need for a comprehensive
understanding - not just identifying these issues but delving into their
possible implications and existing solutions to build more reliable language
models tailored to code intelligence. Based on a well-defined systematic
research approach, we conducted an extensive literature review to uncover the
pitfalls inherent in LM4Code. Finally, 67 primary studies from top-tier venues
have been identified. After carefully examining these studies, we designed a
taxonomy of pitfalls in LM4Code research and conducted a systematic study to
summarize the issues, implications, current solutions, and challenges of
different pitfalls for LM4Code systems. We developed a comprehensive
classification scheme that dissects pitfalls across four crucial aspects: data
collection and labeling, system design and learning, performance evaluation,
and deployment and maintenance. Through this study, we aim to provide a roadmap
for researchers and practitioners, facilitating their understanding and
utilization of LM4Code in reliable and trustworthy ways.
Related papers
- Assessing Large Language Models for Automated Feedback Generation in Learning Programming Problem Solving [0.0]
Large Language Models (LLMs) have emerged as potential tools to automate feedback generation.
This study evaluates the performance of four LLMs on a benchmark dataset of 45 student solutions.
arXiv Detail & Related papers (2025-03-18T18:31:36Z) - Automated Refactoring of Non-Idiomatic Python Code: A Differentiated Replication with LLMs [54.309127753635366]
We present the results of a replication study in which we investigate GPT-4 effectiveness in recommending and suggesting idiomatic actions.
Our findings underscore the potential of LLMs to achieve tasks where, in the past, implementing recommenders based on complex code analyses was required.
arXiv Detail & Related papers (2025-01-28T15:41:54Z) - Large Language Models for Code Generation: The Practitioners Perspective [4.946128083535776]
Large Language Models (LLMs) have emerged as coding assistants, capable of generating source code from natural language prompts.
We propose and develop a multi-model unified platform to generate and execute code based on natural language prompts.
We conducted a survey with 60 software practitioners from 11 countries across four continents to evaluate the usability, performance, strengths, and limitations of each model.
arXiv Detail & Related papers (2025-01-28T14:52:16Z) - Language Models for Code Optimization: Survey, Challenges and Future Directions [7.928856221466083]
Language models (LMs) built upon deep neural networks (DNNs) have recently demonstrated breakthrough effectiveness in software engineering tasks.
This study aims to provide actionable insights and references for both researchers and practitioners in this rapidly evolving field.
arXiv Detail & Related papers (2025-01-02T14:20:36Z) - SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models [54.78329741186446]
We propose a novel paradigm that uses a code-based critic model to guide steps including question-code data construction, quality control, and complementary evaluation.
Experiments across both in-domain and out-of-domain benchmarks in English and Chinese demonstrate the effectiveness of the proposed paradigm.
arXiv Detail & Related papers (2024-08-28T06:33:03Z) - Large Language Models for Secure Code Assessment: A Multi-Language Empirical Study [1.9116784879310031]
We show that GPT-4o achieves the highest vulnerability detection and CWE classification scores using a few-shot setting.
We develop a library called CODEGUARDIAN integrated with VSCode which enables developers to perform LLM-assisted real-time vulnerability analysis.
arXiv Detail & Related papers (2024-08-12T18:10:11Z) - Exploring Automatic Cryptographic API Misuse Detection in the Era of LLMs [60.32717556756674]
This paper introduces a systematic evaluation framework to assess Large Language Models in detecting cryptographic misuses.
Our in-depth analysis of 11,940 LLM-generated reports highlights that the inherent instabilities in LLMs can lead to over half of the reports being false positives.
The optimized approach achieves a remarkable detection rate of nearly 90%, surpassing traditional methods and uncovering previously unknown misuses in established benchmarks.
arXiv Detail & Related papers (2024-07-23T15:31:26Z) - What's Wrong with Your Code Generated by Large Language Models? An Extensive Study [80.18342600996601]
Large language models (LLMs) produce code that is shorter yet more complicated as compared to canonical solutions.
We develop a taxonomy of bugs for incorrect codes that includes three categories and 12 sub-categories, and analyze the root cause for common bug types.
We propose a novel training-free iterative method that introduces self-critique, enabling LLMs to critique and correct their generated code based on bug types and compiler feedback.
arXiv Detail & Related papers (2024-07-08T17:27:17Z) - Coding for Intelligence from the Perspective of Category [66.14012258680992]
Coding targets compressing and reconstructing data, and intelligence.
Recent trends demonstrate the potential homogeneity of these two fields.
We propose a novel problem of Coding for Intelligence from the category theory view.
arXiv Detail & Related papers (2024-07-01T07:05:44Z) - An Empirical Study of Automated Vulnerability Localization with Large Language Models [21.84971967029474]
Large Language Models (LLMs) have shown potential in various domains, yet their effectiveness in vulnerability localization remains underexplored.
Our investigation encompasses 10+ leading LLMs suitable for code analysis, including ChatGPT and various open-source models.
We explore the efficacy of these LLMs using 4 distinct paradigms: zero-shot learning, one-shot learning, discriminative fine-tuning, and generative fine-tuning.
arXiv Detail & Related papers (2024-03-30T08:42:10Z) - Prompting Large Language Models to Tackle the Full Software Development Lifecycle: A Case Study [72.24266814625685]
We explore the performance of large language models (LLMs) across the entire software development lifecycle with DevEval.
DevEval features four programming languages, multiple domains, high-quality data collection, and carefully designed and verified metrics for each task.
Empirical studies show that current LLMs, including GPT-4, fail to solve the challenges presented within DevEval.
arXiv Detail & Related papers (2024-03-13T15:13:44Z) - Robustness, Security, Privacy, Explainability, Efficiency, and Usability
of Large Language Models for Code [9.343299833972253]
Large language models for code (LLM4Code) demonstrate strong performance (e.g., high accuracy) in processing source code.
This paper thoroughly examines 146 relevant studies to identify seven important properties beyond accuracy, including, security, privacy, explainability, efficiency, and robustness.
We discuss the current state-of-the-art methods and trends, identify gaps in existing research, and present promising directions for future study.
arXiv Detail & Related papers (2024-03-12T10:43:26Z) - L2CEval: Evaluating Language-to-Code Generation Capabilities of Large
Language Models [102.00201523306986]
We present L2CEval, a systematic evaluation of the language-to-code generation capabilities of large language models (LLMs)
We analyze the factors that potentially affect their performance, such as model size, pretraining data, instruction tuning, and different prompting methods.
In addition to assessing model performance, we measure confidence calibration for the models and conduct human evaluations of the output programs.
arXiv Detail & Related papers (2023-09-29T17:57:00Z) - A Survey on Automated Software Vulnerability Detection Using Machine
Learning and Deep Learning [19.163031235081565]
Machine Learning (ML) and Deep Learning (DL) based models for detecting vulnerabilities in source code have been presented in recent years.
It may be difficult to discover gaps in existing research and potential for future improvement without a comprehensive survey.
This work address that gap by presenting a systematic survey to characterize various features of ML/DL-based source code level software vulnerability detection approaches.
arXiv Detail & Related papers (2023-06-20T16:51:59Z) - CodeLMSec Benchmark: Systematically Evaluating and Finding Security
Vulnerabilities in Black-Box Code Language Models [58.27254444280376]
Large language models (LLMs) for automatic code generation have achieved breakthroughs in several programming tasks.
Training data for these models is usually collected from the Internet (e.g., from open-source repositories) and is likely to contain faults and security vulnerabilities.
This unsanitized training data can cause the language models to learn these vulnerabilities and propagate them during the code generation procedure.
arXiv Detail & Related papers (2023-02-08T11:54:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.