Related papers: Formal Methods Meets Readability: Auto-Documenting JML Java Code

Formal Methods Meets Readability: Auto-Documenting JML Java Code

URL: http://arxiv.org/abs/2506.09230v1
Date: Tue, 10 Jun 2025 20:39:29 GMT
Title: Formal Methods Meets Readability: Auto-Documenting JML Java Code
Authors: Juan Carlos Recio Abad, Ruben Saborido, Francisco Chicano,
Abstract summary: This paper investigates whether formal specifications using Java Modeling Language (JML) can enhance the quality of Large Language Model (LLM)-generated Javadocs.<n>We present a systematic comparison of documentation generated from JML-annotated and non-annotated Java classes.<n>Our findings demonstrate that JML significantly improves class-level documentation, with more moderate gains at the method level.
Score: 1.0650780147044159
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper investigates whether formal specifications using Java Modeling Language (JML) can enhance the quality of Large Language Model (LLM)-generated Javadocs. While LLMs excel at producing documentation from code alone, we hypothesize that incorporating formally verified invariants yields more complete and accurate results. We present a systematic comparison of documentation generated from JML-annotated and non-annotated Java classes, evaluating quality through both automated metrics and expert analysis. Our findings demonstrate that JML significantly improves class-level documentation completeness, with more moderate gains at the method level. Formal specifications prove particularly effective in capturing complex class invariants and design contracts that are frequently overlooked in code-only documentation. A threshold effect emerges, where the benefits of JML become more pronounced for classes with richer sets of invariants. While JML enhances specification coverage, its impact on core descriptive quality is limited, suggesting that formal specifications primarily ensure comprehensive coverage rather than fundamentally altering implementation descriptions. These results offer actionable insights for software teams adopting formal methods in documentation workflows, highlighting scenarios where JML provides clear advantages. The study contributes to AI-assisted software documentation research by demonstrating how formal methods and LLMs can synergistically improve documentation quality.

Related papers

Automated and Context-Aware Code Documentation Leveraging Advanced LLMs [0.0]
Existing automated approaches primarily focused on code summarization, leaving a gap in template-based documentation generation.<n>We develop a novel, context-aware dataset for Javadoc generation that includes critical structural and semantic information from modern Javas.<n>We evaluate five open-source LLMs (including LLaMA-3.1, Gemma Phi-3, Mistral, Qwen-2.5) using zero-shot, few-shot, and fine-tuned setups and provide a comparative analysis of their performance.
arXiv Detail & Related papers (2025-09-16T06:27:09Z)
Evaluating Large Language Models for Functional and Maintainable Code in Industrial Settings: A Case Study at ASML [3.5515013986822073]
We present a case study conducted in collaboration with the leveling department at A.<n>We investigate the performance of LLMs in generating functional, maintainable code within a closed, highly specialized software environment.<n>The findings reveal that prompting techniques and model size have a significant impact on output quality.
arXiv Detail & Related papers (2025-09-15T19:39:26Z)
On LLM-Assisted Generation of Smart Contracts from Business Processes [0.08192907805418582]
Large language models (LLMs) have changed the reality of how software is produced.<n>We present an exploratory study to investigate the use of LLMs for generating smart contract code from business process descriptions.<n>Our results show that LLM performance falls short of the perfect reliability required for smart contract development.
arXiv Detail & Related papers (2025-07-30T20:39:45Z)
On the Effectiveness of LLM-as-a-judge for Code Generation and Summarization [54.965787768076254]
Large Language Models have been recently exploited as judges for complex natural language processing tasks, such as Q&A.<n>We study the effectiveness of LLMs-as-a-judge for two code-related tasks, namely code generation and code summarization.
arXiv Detail & Related papers (2025-07-22T13:40:26Z)
Beyond Isolated Dots: Benchmarking Structured Table Construction as Deep Knowledge Extraction [80.88654868264645]
Arranged and Organized Extraction Benchmark designed to evaluate ability of large language models to comprehend fragmented documents.<n>AOE includes 11 carefully crafted tasks across three diverse domains, requiring models to generate context-specific schema tailored to varied input queries.<n>Results show that even the most advanced models struggled significantly.
arXiv Detail & Related papers (2025-07-22T06:37:51Z)
SPARQL Query Generation with LLMs: Measuring the Impact of Training Data Memorization and Knowledge Injection [81.78173888579941]
Large Language Models (LLMs) are considered a well-suited method to increase the quality of the question-answering functionality.<n>LLMs are trained on web data, where researchers have no control over whether the benchmark or the knowledge graph was already included in the training data.<n>This paper introduces a novel method that evaluates the quality of LLMs by generating a SPARQL query from a natural-language question.
arXiv Detail & Related papers (2025-07-18T12:28:08Z)
Behavioral Augmentation of UML Class Diagrams: An Empirical Study of Large Language Models for Method Generation [0.0]
This study evaluates nine large language models (LLMs) in augmenting a methodless diagram (21 classes, 17 relationships) using 21 structured waste-management use cases.<n>A total of 90 diagrams (3,373 methods) were assessed across six iterations.
arXiv Detail & Related papers (2025-06-01T02:33:40Z)
Type-Constrained Code Generation with Language Models [51.03439021895432]
We introduce a type-constrained decoding approach that leverages type systems to guide code generation.<n>For this purpose, we develop novel prefix automata and a search over inhabitable types, forming a sound approach to enforce well-typedness on LLM-generated code.<n>Our approach reduces compilation errors by more than half and significantly increases functional correctness in code synthesis, translation, and repair tasks.
arXiv Detail & Related papers (2025-04-12T15:03:00Z)
Next Steps in LLM-Supported Java Verification [0.8057006406834466]
Large Language Models (LLMs) are not only a suitable tool for code generation but also capable of generating annotation-based code specifications.<n>This paper provides early results on how this rigorous toolset can be used to reliably elicit correct specification annotations from an unreliable LLM.
arXiv Detail & Related papers (2025-02-03T17:55:50Z)
ClassEval-T: Evaluating Large Language Models in Class-Level Code Translation [19.69195067838796]
We construct a class-level code translation benchmark, ClassEval-T, and make the first attempt to extensively assess recent LLMs' performance on class-level code translation.<n>It cost us 360 person-hours to accomplish the manual migration to Java and C++ with complete code samples and associated test suites.<n> Experimental results demonstrate a remarkable performance drop compared with the most widely studied method-level code translation benchmark.
arXiv Detail & Related papers (2024-11-09T11:13:14Z)
Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models [90.14693869269519]
MoE LLMs can achieve higher performance with fewer parameters, but it is still hard to deploy them due to their immense parameter sizes. This paper mainly aims to enhance the deployment efficiency of MoE LLMs by introducing plug-and-play expert-level sparsification techniques.
arXiv Detail & Related papers (2024-02-22T18:56:07Z)
AlignedCoT: Prompting Large Language Models via Native-Speaking Demonstrations [52.43593893122206]
Alignedcot is an in-context learning technique for invoking Large Language Models. It achieves consistent and correct step-wise prompts in zero-shot scenarios. We conduct experiments on mathematical reasoning and commonsense reasoning.
arXiv Detail & Related papers (2023-11-22T17:24:21Z)
Exploring Large Language Models for Code Explanation [3.2570216147409514]
Large Language Models (LLMs) have made remarkable strides in Natural Language Processing. This study specifically delves into the task of generating natural-language summaries for code snippets, using various LLMs.
arXiv Detail & Related papers (2023-10-25T14:38:40Z)
BLESS: Benchmarking Large Language Models on Sentence Simplification [55.461555829492866]
We present BLESS, a performance benchmark of the most recent state-of-the-art large language models (LLMs) on the task of text simplification (TS) We assess a total of 44 models, differing in size, architecture, pre-training methods, and accessibility, on three test sets from different domains (Wikipedia, news, and medical) under a few-shot setting. Our evaluation indicates that the best LLMs, despite not being trained on TS, perform comparably with state-of-the-art TS baselines.
arXiv Detail & Related papers (2023-10-24T12:18:17Z)
Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting [68.19544657508509]
Large language models (LLMs) are adopted as a fundamental component of language technologies. We find that several widely used open-source LLMs are extremely sensitive to subtle changes in prompt format in few-shot settings. We propose an algorithm that rapidly evaluates a sampled set of plausible prompt formats for a given task, and reports the interval of expected performance without accessing model weights.
arXiv Detail & Related papers (2023-10-17T15:03:30Z)
How Effective are Large Language Models in Generating Software Specifications? [14.170320751508502]
Large Language Models (LLMs) have been successfully applied to numerous Software Engineering (SE) tasks.<n>We conduct the first empirical study to evaluate the capabilities of LLMs for generating software specifications from software comments or documentation.
arXiv Detail & Related papers (2023-06-06T00:28:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.