Related papers: Large Language Models for Code Summarization

Related papers

A Multi-Language Perspective on the Robustness of LLM Code Generation [2.580765958706854]
We conduct a comprehensive comparative analysis to assess the robustness of several prominent code generation models. We introduce perturbations in four key areas of the prompt: DocString, function name, syntax, and format. This work presents our experimental findings, shedding light on the performance of code generation models in various scenarios.
arXiv Detail & Related papers (2025-04-27T05:00:21Z)
Analysis on LLMs Performance for Code Summarization [0.0]
Large Language Models (LLMs) have significantly advanced the field of code summarization. This study aims to perform a comparative analysis of several open-source LLMs, namely LLaMA-3, Phi-3, Mistral, and Gemma.
arXiv Detail & Related papers (2024-12-22T17:09:34Z)
Curriculum Learning for Small Code Language Models [0.09999629695552192]
This paper explores the potential of curriculum learning in enhancing the performance of code language models. We demonstrate that a well-designed curriculum learning approach significantly improves the accuracy of small decoder-only code language models.
arXiv Detail & Related papers (2024-07-14T13:32:24Z)
CodeGRAG: Bridging the Gap between Natural Language and Programming Language via Graphical Retrieval Augmented Generation [58.84212778960507]
We propose CodeGRAG, a Graphical Retrieval Augmented Code Generation framework to enhance the performance of LLMs. CodeGRAG builds the graphical view of code blocks based on the control flow and data flow of them to fill the gap between programming languages and natural language. Various experiments and ablations are done on four datasets including both the C++ and python languages to validate the hard meta-graph prompt, the soft prompting technique, and the effectiveness of the objectives for pretrained GNN expert.
arXiv Detail & Related papers (2024-05-03T02:48:55Z)
Do Machines and Humans Focus on Similar Code? Exploring Explainability of Large Language Models in Code Summarization [10.201463330812167]
We report negative results from our investigation of explainability of language models in code summarization through the lens of human comprehension. We employ a state-of-the-art model-agnostic, black-box, perturbation-based approach, SHAP, to identify which code tokens influence that generation of summaries. Our study highlights an inability to align human focus with SHAP-based model focus measures.
arXiv Detail & Related papers (2024-02-22T00:01:02Z)
Language Agnostic Code Embeddings [61.84835551549612]
We focus on the cross-lingual capabilities of code embeddings across different programming languages. Code embeddings comprise two distinct components: one deeply tied to the nuances and syntax of a specific language, and the other remaining agnostic to these details. We show that when we isolate and eliminate this language-specific component, we witness significant improvements in downstream code retrieval tasks.
arXiv Detail & Related papers (2023-10-25T17:34:52Z)
Exploring Large Language Models for Code Explanation [3.2570216147409514]
Large Language Models (LLMs) have made remarkable strides in Natural Language Processing. This study specifically delves into the task of generating natural-language summaries for code snippets, using various LLMs.
arXiv Detail & Related papers (2023-10-25T14:38:40Z)
L2CEval: Evaluating Language-to-Code Generation Capabilities of Large Language Models [102.00201523306986]
We present L2CEval, a systematic evaluation of the language-to-code generation capabilities of large language models (LLMs) We analyze the factors that potentially affect their performance, such as model size, pretraining data, instruction tuning, and different prompting methods. In addition to assessing model performance, we measure confidence calibration for the models and conduct human evaluations of the output programs.
arXiv Detail & Related papers (2023-09-29T17:57:00Z)
Code Execution with Pre-trained Language Models [88.04688617516827]
Most pre-trained models for code intelligence ignore the execution trace and only rely on source code and syntactic structures. We develop a mutation-based data augmentation technique to create a large-scale and realistic Python dataset and task for code execution. We then present CodeExecutor, a Transformer model that leverages code execution pre-training and curriculum learning to enhance its semantic comprehension.
arXiv Detail & Related papers (2023-05-08T10:00:05Z)
Python Code Generation by Asking Clarification Questions [57.63906360576212]
In this work, we introduce a novel and more realistic setup for this task. We hypothesize that the under-specification of a natural language description can be resolved by asking clarification questions. We collect and introduce a new dataset named CodeClarQA containing pairs of natural language descriptions and code with created synthetic clarification questions and answers.
arXiv Detail & Related papers (2022-12-19T22:08:36Z)
Multi-lingual Evaluation of Code Generation Models [82.7357812992118]
We present new benchmarks on evaluation code generation models: MBXP and Multilingual HumanEval, and MathQA-X. These datasets cover over 10 programming languages. We are able to assess the performance of code generation models in a multi-lingual fashion.
arXiv Detail & Related papers (2022-10-26T17:17:06Z)
Automatic Generation of Programming Exercises and Code Explanations with Large Language Models [4.947560475228859]
OpenAI Codex is a recent large language model from the GPT-3 family for translating code into natural language. We explore the natural language generation capabilities of Codex in two different phases of the life of a programming exercise. We find the majority of this automatically generated content both novel and sensible, and in many cases ready to use as is.
arXiv Detail & Related papers (2022-06-03T11:00:43Z)
Can Machines Read Coding Manuals Yet? -- A Benchmark for Building Better Language Models for Code Understanding [3.98345038769576]
We derive a set of benchmarks that assess code understanding based on tasks such as predicting the best answer to a question in a forum post. We evaluate the performance of current state-of-the-art language models on these tasks and show that there is a significant improvement on each task from fine tuning.
arXiv Detail & Related papers (2021-09-15T17:42:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.