Issue Retrieval and Verification Enhanced Supplementary Code Comment Generation
- URL: http://arxiv.org/abs/2506.14649v1
- Date: Tue, 17 Jun 2025 15:42:25 GMT
- Title: Issue Retrieval and Verification Enhanced Supplementary Code Comment Generation
- Authors: Yanzhen Zou, Xianlin Zhao, Xinglu Pan, Bing Xie,
- Abstract summary: We propose IsComment, an issue-based LLM retrieval and verification approach for generating supplementary code comments.<n>We first identify five main types of code supplementary information that issue reports can provide through code-comment-issue analysis.<n>To reduce hallucinations, we filter out those candidate comments that are irrelevant to the code or unverifiable by the issue report.
- Score: 1.434589731679756
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Issue reports have been recognized to contain rich information for retrieval-augmented code comment generation. However, how to minimize hallucinations in the generated comments remains significant challenges. In this paper, we propose IsComment, an issue-based LLM retrieval and verification approach for generating method's design rationale, usage directives, and so on as supplementary code comments. We first identify five main types of code supplementary information that issue reports can provide through code-comment-issue analysis. Next, we retrieve issue sentences containing these types of supplementary information and generate candidate code comments. To reduce hallucinations, we filter out those candidate comments that are irrelevant to the code or unverifiable by the issue report, making the code comment generation results more reliable. Our experiments indicate that compared with LLMs, IsComment increases the coverage of manual supplementary comments from 33.6% to 72.2% for ChatGPT, from 35.8% to 88.4% for GPT-4o, and from 35.0% to 86.2% for DeepSeek-V3. Compared with existing work, IsComment can generate richer and more useful supplementary code comments for programming understanding, which is quantitatively evaluated through the MESIA metric on both methods with and without manual code comments.
Related papers
- VERINA: Benchmarking Verifiable Code Generation [47.9771074559674]
Large language models (LLMs) are increasingly integrated in software development.<n>Verifiable code generation offers a promising path to address this limitation.<n>Current benchmarks often lack support for end-to-end verifiable code generation.
arXiv Detail & Related papers (2025-05-29T06:12:52Z) - Prompting and Fine-tuning Large Language Models for Automated Code Review Comment Generation [5.6001617185032595]
Large language models pretrained on both programming and natural language data tend to perform well in code-oriented tasks.
We fine-tune open-source Large language models (LLM) in parameter-efficient, quantized low-rank fashion on consumer-grade hardware to improve review comment generation.
arXiv Detail & Related papers (2024-11-15T12:01:38Z) - Impact of LLM-based Review Comment Generation in Practice: A Mixed Open-/Closed-source User Study [13.650356901064807]
This user study was performed in two organizations, Mozilla and Ubisoft.
We observed that 8.1% and 7.2%, respectively, of LLM-generated comments were accepted by reviewers in each organization.
Refactoring-related comments are more likely to be accepted than Functional comments.
arXiv Detail & Related papers (2024-11-11T16:12:11Z) - Understanding Code Understandability Improvements in Code Reviews [79.16476505761582]
We analyzed 2,401 code review comments from Java open-source projects on GitHub.
83.9% of suggestions for improvement were accepted and integrated, with fewer than 1% later reverted.
arXiv Detail & Related papers (2024-10-29T12:21:23Z) - AUTOGENICS: Automated Generation of Context-Aware Inline Comments for Code Snippets on Programming Q&A Sites Using LLM [1.971759811837406]
Inline comments in source code facilitate easy comprehension, reusability, and enhanced readability.
Code snippets in answers on Q&A sites like Stack Overflow (SO) often lack comments because answerers volunteer their time and often skip comments or explanations due to time constraints.
Given these challenges, we introduced AUTOGENICS, a tool designed to integrate with SO to generate effective inline comments for code snippets in SO answers exploiting large language models.
arXiv Detail & Related papers (2024-08-27T21:21:13Z) - COMCAT: Leveraging Human Judgment to Improve Automatic Documentation and Summarization [4.1491806566512235]
COMCAT is an approach to automate comment generation by augmenting Large Language Models with expertise-guided context.
We develop the COMCAT pipeline to comment C/C++ files by (1) automatically identifying suitable locations in which to place comments, (2) predicting the most helpful type of comment for each location, and (3) generating a comment based on the selected location and comment type.
In a human subject evaluation, we demonstrate that COMCAT-generated comments significantly improve developer code comprehension across three indicative software engineering tasks by up to 12% for 87% of participants.
arXiv Detail & Related papers (2024-07-18T16:26:31Z) - CodeHalu: Investigating Code Hallucinations in LLMs via Execution-based Verification [73.66920648926161]
We introduce the concept of code hallucinations and propose a classification method for code hallucination based on execution verification.<n>We present a dynamic detection algorithm called CodeHalu designed to detect and quantify code hallucinations.<n>We also introduce the CodeHaluEval benchmark, which includes 8,883 samples from 699 tasks, to systematically and quantitatively evaluate code hallucinations.
arXiv Detail & Related papers (2024-04-30T23:56:38Z) - Comments as Natural Logic Pivots: Improve Code Generation via Comment Perspective [85.48043537327258]
We propose MANGO (comMents As Natural loGic pivOts), including a comment contrastive training strategy and a corresponding logical comment decoding strategy.
Results indicate that MANGO significantly improves the code pass rate based on the strong baselines.
The robustness of the logical comment decoding strategy is notably higher than the Chain-of-thoughts prompting.
arXiv Detail & Related papers (2024-04-11T08:30:46Z) - Code Prompting Elicits Conditional Reasoning Abilities in Text+Code LLMs [65.2379940117181]
We introduce code prompting, a chain of prompts that transforms a natural language problem into code.
We find that code prompting exhibits a high-performance boost for multiple LLMs.
Our analysis of GPT 3.5 reveals that the code formatting of the input problem is essential for performance improvement.
arXiv Detail & Related papers (2024-01-18T15:32:24Z) - Deep Just-In-Time Inconsistency Detection Between Comments and Source
Code [51.00904399653609]
In this paper, we aim to detect whether a comment becomes inconsistent as a result of changes to the corresponding body of code.
We develop a deep-learning approach that learns to correlate a comment with code changes.
We show the usefulness of our approach by combining it with a comment update model to build a more comprehensive automatic comment maintenance system.
arXiv Detail & Related papers (2020-10-04T16:49:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.