Comparative Analysis of Large Language Models for Context-Aware Code Completion using SAFIM Framework
- URL: http://arxiv.org/abs/2502.15243v1
- Date: Fri, 21 Feb 2025 06:32:31 GMT
- Title: Comparative Analysis of Large Language Models for Context-Aware Code Completion using SAFIM Framework
- Authors: Hang Zhang, Yanxin Shen, Lun Wang, Chuanqi Shi, Shaoshuai Du, Yiyi Tao, Yixian Shen,
- Abstract summary: Large Language Models (LLMs) have revolutionized code completion, transforming it into a more intelligent and context-aware feature.<n>This study evaluates the performance of several chat-based LLMs, including Gemini 1.5 Flash, Gemini 1.5 Pro, GPT-4o, GPT-4o-mini, and GPT-4 Turbo.
- Score: 5.312946761836463
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The advent of Large Language Models (LLMs) has revolutionized code completion, transforming it into a more intelligent and context-aware feature in modern integrated development environments. These advancements have significantly enhanced developers' ability to write efficient and error-free code. This study evaluates the performance of several chat-based LLMs, including Gemini 1.5 Flash, Gemini 1.5 Pro, GPT-4o, GPT-4o-mini, and GPT-4 Turbo, using the Syntax-Aware Fill-in-the-Middle (SAFIM) dataset. This benchmark is specifically designed to assess models' capabilities in syntax-sensitive code generation. Performance metrics, such as cosine similarity with ground-truth completions and latency, were employed to measure both accuracy and efficiency. The findings reveal substantial differences in the models' code completion abilities, offering valuable insights into their respective strengths and weaknesses. This work provides a comparative analysis that underscores the trade-offs between accuracy and speed, establishing a benchmark for future advancements in LLM-based code completion.
Related papers
- CodeIF: Benchmarking the Instruction-Following Capabilities of Large Language Models for Code Generation [24.090719826360342]
We introduce CodeIF, the first benchmark designed to assess the abilities of Large Language Models (LLMs) to adhere to task-oriented instructions within code generation scenarios.
We conduct extensive experiments with LLMs, analyzing their strengths and limitations in meeting the demands of these tasks.
arXiv Detail & Related papers (2025-02-26T14:19:49Z) - Guided Code Generation with LLMs: A Multi-Agent Framework for Complex Code Tasks [1.9198713957364215]
Large Language Models (LLMs) have shown remarkable capabilities in code generation tasks.<n>They face significant limitations in handling complex, long-context programming challenges.<n>This paper introduces a novel agentic framework for guided code generation''
arXiv Detail & Related papers (2025-01-11T19:21:53Z) - Prompting and Fine-tuning Large Language Models for Automated Code Review Comment Generation [5.6001617185032595]
Large language models pretrained on both programming and natural language data tend to perform well in code-oriented tasks.
We fine-tune open-source Large language models (LLM) in parameter-efficient, quantized low-rank fashion on consumer-grade hardware to improve review comment generation.
arXiv Detail & Related papers (2024-11-15T12:01:38Z) - COAST: Enhancing the Code Debugging Ability of LLMs through Communicative Agent Based Data Synthesis [29.667170755786508]
We introduce EVAL, a benchmark for evaluating the abilities of Large Language Models.<n>We propose the COmmunicative Agent-based data SynThesis framework, which employs a multi-agent system to generate high-quality training data.<n>Results demonstrate that COAST-generated data outperform human-curated and GPT-4-generated data.
arXiv Detail & Related papers (2024-08-09T11:35:44Z) - M2CVD: Enhancing Vulnerability Semantic through Multi-Model Collaboration for Code Vulnerability Detection [52.4455893010468]
Large Language Models (LLMs) have strong capabilities in code comprehension, but fine-tuning costs and semantic alignment issues limit their project-specific optimization.
Code models such CodeBERT are easy to fine-tune, but it is often difficult to learn vulnerability semantics from complex code languages.
This paper introduces the Multi-Model Collaborative Vulnerability Detection approach (M2CVD) to improve the detection accuracy of code models.
arXiv Detail & Related papers (2024-06-10T00:05:49Z) - Prompting Large Language Models to Tackle the Full Software Development Lifecycle: A Case Study [72.24266814625685]
We explore the performance of large language models (LLMs) across the entire software development lifecycle with DevEval.<n>DevEval features four programming languages, multiple domains, high-quality data collection, and carefully designed and verified metrics for each task.<n> Empirical studies show that current LLMs, including GPT-4, fail to solve the challenges presented within DevEval.
arXiv Detail & Related papers (2024-03-13T15:13:44Z) - Evaluation of LLMs on Syntax-Aware Code Fill-in-the-Middle Tasks [12.629516072317331]
Syntax-Aware Fill-In-the-Middle (SAFIM) is a new benchmark for evaluating Large Language Models (LLMs) on the code Fill-in-the-Middle (FIM) task.
This benchmark focuses on syntax-aware completions of program structures such as code blocks and conditional expressions.
arXiv Detail & Related papers (2024-03-07T05:05:56Z) - PPTC-R benchmark: Towards Evaluating the Robustness of Large Language
Models for PowerPoint Task Completion [96.47420221442397]
We construct adversarial user instructions by attacking user instructions at sentence, semantic, and multi-language levels.
We test 3 closed-source and 4 open-source LLMs using a benchmark that incorporates robustness settings.
We find that GPT-4 exhibits the highest performance and strong robustness in our benchmark.
arXiv Detail & Related papers (2024-03-06T15:33:32Z) - Code Needs Comments: Enhancing Code LLMs with Comment Augmentation [91.52444946362547]
We introduce a novel data augmentation method that generates comments for existing code, coupled with a data filtering strategy that filters out code data poorly correlated with natural language.
We conducted experiments on three code-focused Large Language Models and observed consistent improvements in performance on two widely-used programming skill benchmarks.
arXiv Detail & Related papers (2024-02-20T13:56:38Z) - Speculative Contrastive Decoding [55.378200871224074]
Large language models(LLMs) exhibit exceptional performance in language tasks, yet their auto-regressive inference is limited due to high computational requirements and is sub-optimal due to the exposure bias.
Inspired by speculative decoding and contrastive decoding, we introduce Speculative Contrastive Decoding(SCD), a straightforward yet powerful decoding approach.
arXiv Detail & Related papers (2023-11-15T14:15:30Z) - MAgIC: Investigation of Large Language Model Powered Multi-Agent in Cognition, Adaptability, Rationality and Collaboration [98.18244218156492]
Large Language Models (LLMs) have significantly advanced natural language processing.<n>As their applications expand into multi-agent environments, there arises a need for a comprehensive evaluation framework.<n>This work introduces a novel competition-based benchmark framework to assess LLMs within multi-agent settings.
arXiv Detail & Related papers (2023-11-14T21:46:27Z) - CoAnnotating: Uncertainty-Guided Work Allocation between Human and Large
Language Models for Data Annotation [94.59630161324013]
We propose CoAnnotating, a novel paradigm for Human-LLM co-annotation of unstructured texts at scale.
Our empirical study shows CoAnnotating to be an effective means to allocate work from results on different datasets, with up to 21% performance improvement over random baseline.
arXiv Detail & Related papers (2023-10-24T08:56:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.