Optimizing Knowledge Utilization for Multi-Intent Comment Generation with Large Language Models
- URL: http://arxiv.org/abs/2510.25195v1
- Date: Wed, 29 Oct 2025 06:02:23 GMT
- Title: Optimizing Knowledge Utilization for Multi-Intent Comment Generation with Large Language Models
- Authors: Shuochuan Li, Zan Wang, Xiaoning Du, Zhuo Wu, Jiuqiao Yu, Junjie Chen,
- Abstract summary: Code comment generation aims to produce a generic overview of a code snippet, helping developers understand and maintain code.<n>This highlights the necessity of multi-intent comment generation.<n>We propose a framework named KUMIC for multi-intent comment generation.
- Score: 12.546021344852305
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Code comment generation aims to produce a generic overview of a code snippet, helping developers understand and maintain code. However, generic summaries alone are insufficient to meet the diverse needs of practitioners; for example, developers expect the implementation insights to be presented in an untangled manner, while users seek clear usage instructions. This highlights the necessity of multi-intent comment generation. With the widespread adoption of Large Language Models (LLMs) for code-related tasks, these models have been leveraged to tackle the challenge of multi-intent comment generation. Despite their successes, state-of-the-art LLM-based approaches often struggle to construct correct relationships among intents, code, and comments within a smaller number of demonstration examples. To mitigate this issue, we propose a framework named KUMIC for multi-intent comment generation. Built upon in-context learning, KUMIC leverages Chain-of-Thought (CoT) to optimize knowledge utilization for LLMs to generate intent-specific comments. Specifically, KUMIC first designs a retrieval mechanism to obtain similar demonstration examples, which exhibit high code-comment consistency. Then, KUMIC leverages CoT to guide LLMs to focus on statements facilitating the derivation of code comments aligned with specific intents. In this context, KUMIC constructs a mapping knowledge chain, linking code to intent-specific statements to comments, which enables LLMs to follow similar reasoning steps when generating the desired comments. We conduct extensive experiments to evaluate KUMIC, and the results demonstrate that KUMIC outperforms state-of-the-art baselines by 14.49\%, 22.41\%, 20.72\%, and 12.94\% in terms of BLEU, METEOR, ROUGE-L, and SBERT, respectively.
Related papers
- CodeSimpleQA: Scaling Factuality in Code Large Language Models [55.705748501461294]
We present CodeSimpleQA, a comprehensive benchmark designed to evaluate the factual accuracy of code LLMs in answering code-related questions.<n>We also create CodeSimpleQA-Instruct, a large-scale instruction corpus with 66M samples, and develop a post-training framework combining supervised fine-tuning and reinforcement learning.
arXiv Detail & Related papers (2025-12-22T14:27:17Z) - Operationalizing Large Language Models with Design-Aware Contexts for Code Comment Generation [6.19791892410626]
This study focuses on the feasibility of design documents as a context for the LLMs to generate more useful comments.<n>Design documents are often used by maintainers to understand code when comments do not suffice.
arXiv Detail & Related papers (2025-10-25T15:44:22Z) - On LLM-Assisted Generation of Smart Contracts from Business Processes [0.08192907805418582]
Large language models (LLMs) have changed the reality of how software is produced.<n>We present an exploratory study to investigate the use of LLMs for generating smart contract code from business process descriptions.<n>Our results show that LLM performance falls short of the perfect reliability required for smart contract development.
arXiv Detail & Related papers (2025-07-30T20:39:45Z) - IFEvalCode: Controlled Code Generation [69.28317223249358]
The paper introduces forward and backward constraints generation to improve the instruction-following capabilities of Code LLMs.<n>The authors present IFEvalCode, a multilingual benchmark comprising 1.6K test samples across seven programming languages.
arXiv Detail & Related papers (2025-07-30T08:08:48Z) - Is Compression Really Linear with Code Intelligence? [60.123628177110206]
textitFormat Annealing is a lightweight, transparent training methodology designed to assess the intrinsic capabilities of pre-trained models equitably.<n>Our empirical results reveal a fundamental logarithmic relationship between measured code intelligence and bits-per-character (BPC)<n>Our work provides a more nuanced understanding of compression's role in developing code intelligence and contributes a robust evaluation framework in the code domain.
arXiv Detail & Related papers (2025-05-16T16:59:14Z) - Type-Constrained Code Generation with Language Models [51.03439021895432]
We introduce a type-constrained decoding approach that leverages type systems to guide code generation.<n>For this purpose, we develop novel prefix automata and a search over inhabitable types, forming a sound approach to enforce well-typedness on LLM-generated code.<n>Our approach reduces compilation errors by more than half and significantly increases functional correctness in code synthesis, translation, and repair tasks.
arXiv Detail & Related papers (2025-04-12T15:03:00Z) - Pragmatic Reasoning improves LLM Code Generation [35.78260347663757]
We propose CodeRSA, a novel code candidate reranking mechanism built upon the Rational Speech Act (RSA) framework.<n>We evaluate CodeRSA using one of the latest Large Language Models on a popular code generation dataset.
arXiv Detail & Related papers (2025-02-20T12:44:26Z) - Comments as Natural Logic Pivots: Improve Code Generation via Comment Perspective [85.48043537327258]
We propose MANGO (comMents As Natural loGic pivOts), including a comment contrastive training strategy and a corresponding logical comment decoding strategy.
Results indicate that MANGO significantly improves the code pass rate based on the strong baselines.
The robustness of the logical comment decoding strategy is notably higher than the Chain-of-thoughts prompting.
arXiv Detail & Related papers (2024-04-11T08:30:46Z) - InfiBench: Evaluating the Question-Answering Capabilities of Code Large Language Models [56.723509505549536]
InfiBench is the first large-scale freeform question-answering (QA) benchmark for code to our knowledge.
It comprises 234 carefully selected high-quality Stack Overflow questions that span across 15 programming languages.
We conduct a systematic evaluation for over 100 latest code LLMs on InfiBench, leading to a series of novel and insightful findings.
arXiv Detail & Related papers (2024-03-11T02:06:30Z) - If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code
Empowers Large Language Models to Serve as Intelligent Agents [81.60906807941188]
Large language models (LLMs) are trained on a combination of natural language and formal language (code)
Code translates high-level goals into executable steps, featuring standard syntax, logical consistency, abstraction, and modularity.
arXiv Detail & Related papers (2024-01-01T16:51:20Z) - Large Language Models are Few-Shot Summarizers: Multi-Intent Comment
Generation via In-Context Learning [34.006227676170504]
This study investigates the feasibility of utilizing large language models (LLMs) to generate comments that can fulfill developers' diverse intents.
Experiments on two large-scale datasets demonstrate the rationale of our insights.
arXiv Detail & Related papers (2023-04-22T12:26:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.