Related papers: Code Comment Inconsistency Detection with BERT and Longformer

Code Comment Inconsistency Detection with BERT and Longformer

URL: http://arxiv.org/abs/2207.14444v1
Date: Fri, 29 Jul 2022 02:43:51 GMT
Title: Code Comment Inconsistency Detection with BERT and Longformer
Authors: Theo Steiner and Rui Zhang
Abstract summary: Comments, or natural language descriptions of source code, are standard practice among software developers. When the code is modified without an accompanying correction to the comment, an inconsistency between the comment and code can arise. We propose two models to detect such inconsistencies in a natural language inference (NLI) context.
Score: 9.378041196272878
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Comments, or natural language descriptions of source code, are standard practice among software developers. By communicating important aspects of the code such as functionality and usage, comments help with software project maintenance. However, when the code is modified without an accompanying correction to the comment, an inconsistency between the comment and code can arise, which opens up the possibility for developer confusion and bugs. In this paper, we propose two models based on BERT (Devlin et al., 2019) and Longformer (Beltagy et al., 2020) to detect such inconsistencies in a natural language inference (NLI) context. Through an evaluation on a previously established corpus of comment-method pairs both during and after code changes, we demonstrate that our models outperform multiple baselines and yield comparable results to the state-of-the-art models that exclude linguistic and lexical features. We further discuss ideas for future research in using pretrained language models for both inconsistency detection and automatic comment updating.

Related papers

IFEvalCode: Controlled Code Generation [69.28317223249358]
The paper introduces forward and backward constraints generation to improve the instruction-following capabilities of Code LLMs.<n>The authors present IFEvalCode, a multilingual benchmark comprising 1.6K test samples across seven programming languages.
arXiv Detail & Related papers (2025-07-30T08:08:48Z)
A Qualitative Investigation into LLM-Generated Multilingual Code Comments and Automatic Evaluation Metrics [10.1653613066577]
We conduct an open-coding study to analyze errors in code comments generated by five state-of-the-art code models.<n>We identify a taxonomy of 26 distinct error categories in model-generated code comments.<n>Our analysis shows that, while these models frequently produce partially correct comments, modern neural fail to reliably differentiate meaningful completions from random noise.
arXiv Detail & Related papers (2025-05-21T12:45:49Z)
Type-Constrained Code Generation with Language Models [51.03439021895432]
Large language models (LLMs) produce uncompilable output because their next-token inference procedure does not model formal aspects of code. We introduce a type-constrained decoding approach that leverages type systems to guide code generation. Our approach reduces compilation errors by more than half and increases functional correctness in code synthesis, translation, and repair tasks.
arXiv Detail & Related papers (2025-04-12T15:03:00Z)
Investigating and Scaling up Code-Switching for Multilingual Language Model Pre-Training [58.696660064190475]
We find that the existence of code-switching, alternating between different languages within a context, is key to multilingual capabilities. To better explore the power of code-switching for language alignment during pre-training, we investigate the strategy of synthetic code-switching.
arXiv Detail & Related papers (2025-04-02T15:09:58Z)
CodeReviewQA: The Code Review Comprehension Assessment for Large Language Models [10.108114365564926]
State-of-the-art large language models (LLMs) have demonstrated impressive code generation capabilities but struggle with real-world software engineering tasks. Code review comments are often implicit, ambiguous, and colloquial, requiring models to grasp both code and human intent. This challenge calls for evaluating large language models' ability to bridge both technical and conversational contexts.
arXiv Detail & Related papers (2025-03-20T14:07:31Z)
CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction [47.17755403213469]
We propose CodeI/O, a novel approach that condenses diverse reasoning patterns embedded in contextually-grounded codes. By training models to predict inputs/outputs given code and test cases entirely in natural language, we expose them to universal reasoning primitives. Experimental results demonstrate CodeI/O leads to consistent improvements across symbolic, scientific, logic, math & numerical, and commonsense reasoning tasks.
arXiv Detail & Related papers (2025-02-11T07:26:50Z)
Code Documentation and Analysis to Secure Software Development [0.0]
CoDAT is a tool designed to maintain consistency between the various levels of code documentation. It is implemented in the Intellij IDEA. We use a large language model to check the semantic consistency between a fragment of code and the comments that describe it.
arXiv Detail & Related papers (2024-07-16T17:25:44Z)
Beyond Functional Correctness: Investigating Coding Style Inconsistencies in Large Language Models [28.295926947968574]
Large language models (LLMs) have brought a paradigm shift to the field of code generation. We empirically analyze the differences in coding style between the code generated by Code LLMs and the code written by human developers.
arXiv Detail & Related papers (2024-06-29T14:56:11Z)
When simplicity meets effectiveness: Detecting code comments coherence with word embeddings and LSTM [6.417777780911223]
Code comments play a crucial role in software development, as they provide programmers with practical information. Developers tend to leave comments unchanged after updating the code, resulting in a discrepancy between the two artifacts. It is crucial to identify if, given a code snippet, its corresponding comment is coherent and reflects well the intent behind the code.
arXiv Detail & Related papers (2024-05-25T15:21:27Z)
AdaCCD: Adaptive Semantic Contrasts Discovery Based Cross Lingual Adaptation for Code Clone Detection [69.79627042058048]
AdaCCD is a novel cross-lingual adaptation method that can detect cloned codes in a new language without annotations in that language. We evaluate the cross-lingual adaptation results of AdaCCD by constructing a multilingual code clone detection benchmark consisting of 5 programming languages.
arXiv Detail & Related papers (2023-11-13T12:20:48Z)
L2CEval: Evaluating Language-to-Code Generation Capabilities of Large Language Models [102.00201523306986]
We present L2CEval, a systematic evaluation of the language-to-code generation capabilities of large language models (LLMs) We analyze the factors that potentially affect their performance, such as model size, pretraining data, instruction tuning, and different prompting methods. In addition to assessing model performance, we measure confidence calibration for the models and conduct human evaluations of the output programs.
arXiv Detail & Related papers (2023-09-29T17:57:00Z)
ReACC: A Retrieval-Augmented Code Completion Framework [53.49707123661763]
We propose a retrieval-augmented code completion framework, leveraging both lexical copying and referring to code with similar semantics by retrieval. We evaluate our approach in the code completion task in Python and Java programming languages, achieving a state-of-the-art performance on CodeXGLUE benchmark.
arXiv Detail & Related papers (2022-03-15T08:25:08Z)
Deep Just-In-Time Inconsistency Detection Between Comments and Source Code [51.00904399653609]
In this paper, we aim to detect whether a comment becomes inconsistent as a result of changes to the corresponding body of code. We develop a deep-learning approach that learns to correlate a comment with code changes. We show the usefulness of our approach by combining it with a comment update model to build a more comprehensive automatic comment maintenance system.
arXiv Detail & Related papers (2020-10-04T16:49:28Z)
Code to Comment "Translation": Data, Metrics, Baselining & Evaluation [49.35567240750619]
We analyze several recent code-comment datasets for this task. We compare them with WMT19, a standard dataset frequently used to train state of the art natural language translators. We find some interesting differences between the code-comment data and the WMT19 natural language data.
arXiv Detail & Related papers (2020-10-03T18:57:26Z)
Learning to Update Natural Language Comments Based on Code Changes [48.829941738578086]
We formulate the novel task of automatically updating an existing natural language comment based on changes in the body of code it accompanies. We propose an approach that learns to correlate changes across two distinct language representations, to generate a sequence of edits that are applied to the existing comment to reflect the source code modifications.
arXiv Detail & Related papers (2020-04-25T15:37:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.