Related papers: Analyzing the Performance of Large Language Models on Code Summarization

Analyzing the Performance of Large Language Models on Code Summarization

URL: http://arxiv.org/abs/2404.08018v1
Date: Wed, 10 Apr 2024 22:42:18 GMT
Title: Analyzing the Performance of Large Language Models on Code Summarization
Authors: Rajarshi Haldar, Julia Hockenmaier,
Abstract summary: Large language models (LLMs) such as Llama 2 perform very well on tasks that involve both natural language and source code. We show that for the task of code summarization, the performance of these models often depends on the amount of (subword) token overlap between the code and the corresponding natural language descriptions in the dataset.
Score: 4.6785446727033335
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) such as Llama 2 perform very well on tasks that involve both natural language and source code, particularly code summarization and code generation. We show that for the task of code summarization, the performance of these models on individual examples often depends on the amount of (subword) token overlap between the code and the corresponding reference natural language descriptions in the dataset. This token overlap arises because the reference descriptions in standard datasets (corresponding to docstrings in large code bases) are often highly similar to the names of the functions they describe. We also show that this token overlap occurs largely in the function names of the code and compare the relative performance of these models after removing function names versus removing code structure. We also show that using multiple evaluation metrics like BLEU and BERTScore gives us very little additional insight since these metrics are highly correlated with each other.

Related papers

Functional Consistency of LLM Code Embeddings: A Self-Evolving Data Synthesis Framework for Benchmarking [23.980033692974278]
Embedding models have demonstrated strong performance in tasks like clustering, retrieval, and feature extraction while offering computational advantages over generative models and cross-encoders.<n>We propose a novel data synthesis framework called Functionality-Oriented Code Self-Evolution to construct diverse and challenging benchmarks.<n>Our framework generates four unique variations from a single code instance, providing a broader spectrum of code examples that better reflect functional differences.
arXiv Detail & Related papers (2025-08-27T04:17:02Z)
IFEvalCode: Controlled Code Generation [69.28317223249358]
The paper introduces forward and backward constraints generation to improve the instruction-following capabilities of Code LLMs.<n>The authors present IFEvalCode, a multilingual benchmark comprising 1.6K test samples across seven programming languages.
arXiv Detail & Related papers (2025-07-30T08:08:48Z)
On the Effect of Token Merging on Pre-trained Models for Code [11.029842116504726]
We investigate the effect of merging the hidden representations of subtokens that belong to the same semantic unit.<n>We propose two strategies: one based on averaging the representations and another that leverages a learning-based approach.<n>Results show that these strategies can reduce the number of floating-point operations by $1%$ to $19%$.
arXiv Detail & Related papers (2025-07-19T00:48:20Z)
EpiCoder: Encompassing Diversity and Complexity in Code Generation [66.43738008739555]
Existing methods for code generation use code snippets as seed data.<n>We introduce a novel feature tree-based synthesis framework, which revolves around hierarchical code features.<n>Our framework provides precise control over the complexity of the generated code, enabling functionalities that range from function-level operations to multi-file scenarios.
arXiv Detail & Related papers (2025-01-08T18:58:15Z)
SparseCoder: Identifier-Aware Sparse Transformer for File-Level Code Summarization [51.67317895094664]
This paper studies file-level code summarization, which can assist programmers in understanding and maintaining large source code projects. We propose SparseCoder, an identifier-aware sparse transformer for effectively handling long code sequences.
arXiv Detail & Related papers (2024-01-26T09:23:27Z)
Exploring Large Language Models for Code Explanation [3.2570216147409514]
Large Language Models (LLMs) have made remarkable strides in Natural Language Processing. This study specifically delves into the task of generating natural-language summaries for code snippets, using various LLMs.
arXiv Detail & Related papers (2023-10-25T14:38:40Z)
LongCoder: A Long-Range Pre-trained Language Model for Code Completion [56.813974784131624]
LongCoder employs a sliding window mechanism for self-attention and introduces two types of globally accessible tokens. Bridge tokens are inserted throughout the input sequence to aggregate local information and facilitate global interaction. memory tokens are included to highlight important statements that may be invoked later and need to be memorized.
arXiv Detail & Related papers (2023-06-26T17:59:24Z)
CompoundPiece: Evaluating and Improving Decompounding Performance of Language Models [77.45934004406283]
We systematically study decompounding, the task of splitting compound words into their constituents. We introduce a dataset of 255k compound and non-compound words across 56 diverse languages obtained from Wiktionary. We introduce a novel methodology to train dedicated models for decompounding.
arXiv Detail & Related papers (2023-05-23T16:32:27Z)
LAMNER: Code Comment Generation Using Character Language Model and Named Entity Recognition [0.7894331610810762]
We present LAnguage Model and Named Entity Recognition (LAMNER) LAMNER is a code comment generator capable of encoding code constructs effectively and capturing the structural property of a code token. We evaluate the generated comments from LAMNER and other baselines on a popular Java dataset with four commonly used metrics.
arXiv Detail & Related papers (2022-04-05T20:53:06Z)
Multilingual Autoregressive Entity Linking [49.35994386221958]
mGENRE is a sequence-to-sequence system for the Multilingual Entity Linking problem. For a mention in a given language, mGENRE predicts the name of the target entity left-to-right, token-by-token. We show the efficacy of our approach through extensive evaluation including experiments on three popular MEL benchmarks.
arXiv Detail & Related papers (2021-03-23T13:25:55Z)
Deep Graph Matching and Searching for Semantic Code Retrieval [76.51445515611469]
We propose an end-to-end deep graph matching and searching model based on graph neural networks. We first represent both natural language query texts and programming language code snippets with the unified graph-structured data. In particular, DGMS not only captures more structural information for individual query texts or code snippets but also learns the fine-grained similarity between them.
arXiv Detail & Related papers (2020-10-24T14:16:50Z)
GraphCodeBERT: Pre-training Code Representations with Data Flow [97.00641522327699]
We present GraphCodeBERT, a pre-trained model for programming language that considers the inherent structure of code. We use data flow in the pre-training stage, which is a semantic-level structure of code that encodes the relation of "where-the-value-comes-from" between variables. We evaluate our model on four tasks, including code search, clone detection, code translation, and code refinement.
arXiv Detail & Related papers (2020-09-17T15:25:56Z)
Leveraging Code Generation to Improve Code Retrieval and Summarization via Dual Learning [18.354352985591305]
Code summarization generates brief natural language description given a source code snippet, while code retrieval fetches relevant source code given a natural language query. Recent studies have combined these two tasks to improve their performance. We propose a novel end-to-end model for the two tasks by introducing an additional code generation task.
arXiv Detail & Related papers (2020-02-24T12:26:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.