Related papers: SLD-Spec: Enhancement LLM-assisted Specification Generation for Complex Loop Functions via Program Slicing and Logical Deletion

SLD-Spec: Enhancement LLM-assisted Specification Generation for Complex Loop Functions via Program Slicing and Logical Deletion

URL: http://arxiv.org/abs/2509.09917v1
Date: Fri, 12 Sep 2025 01:40:27 GMT
Title: SLD-Spec: Enhancement LLM-assisted Specification Generation for Complex Loop Functions via Program Slicing and Logical Deletion
Authors: Zehan Chen, Long Zhang, Zhiwei Zhang, JingJing Zhang, Ruoyu Zhou, Yulong Shen, JianFeng Ma, Lin Yang,
Abstract summary: SLD-Spec is an LLM-assisted specification generation method tailored for programs with complex loop constructs.<n>We show that SLD-Spec successfully verifies five more programs than the state-of-the-art AutoSpec and reduces runtime by 23.73%.
Score: 29.231420590756954
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Automatically generating formal specifications from program code can greatly enhance the efficiency of program verification and enable end-to-end automation from requirements to reliable software. However, existing LLM-based approaches often struggle with programs that include complex loop structures, leading to irrelevant specifications. Moreover, the rigorous proof obligations and design constraints imposed by verification tools can further result in incomplete and ambiguous specifications. To address these challenges, we propose SLD-Spec, an LLM-assisted specification generation method tailored for programs with complex loop constructs. SLD-Spec introduces two novel phases into the traditional specification generation framework: (1) A slicing phase, which decomposes each function into code fragments containing independent loop structures, thereby reducing the complexity of specification generation; and (2) A logical deletion phase, which applies LLM-based reasoning to filter out incorrect candidate specifications--especially those not easily identified by verification tool--while retaining valid ones. Experimental results show that on the simple dataset, SLD-Spec successfully verifies five more programs than the state-of-the-art AutoSpec and reduces runtime by 23.73%. To address the limitations of existing research, we manually construct a dataset comprising four categories of complex loop programs. On this dataset, SLD-Spec significantly improves the correctness, relevance, and completeness of generated specifications compared to baseline methods, enabling 95.1% of assertions and 90.91% of programs to pass verification. Ablation studies further reveal that logical deletion is critical for enhancing specification correctness and relevance, while program slicing contributes significantly to specification completeness. Our code and data are publicly available.

Related papers

Are LLMs Reliable Code Reviewers? Systematic Overcorrection in Requirement Conformance Judgement [8.059802912761919]
We uncover a systematic failure of large language models (LLMs) in matching code to natural language requirements.<n>More detailed prompt design, particularly with those requiring explanations and proposed corrections, leads to higher misjudgment rates.<n>We propose a Fix-guided Verification Filter that treats the model proposed fix as executable counterfactual evidence.
arXiv Detail & Related papers (2026-02-28T08:35:25Z)
Beyond Basic Specifications? A Systematic Study of Logical Constructs in LLM-based Specification Generation [29.231420590756954]
Large language models (LLMs) for the automatic generation of program specifications has emerged as a promising avenue for enhancing verification efficiency.<n>We propose incorporating logical constructs into existing LLM-based specification generation framework.<n>We conduct an empirical study aimed at exploring the impact of various types of syntactic constructs on specification generation framework.
arXiv Detail & Related papers (2026-01-31T13:19:40Z)
Self-Correction Distillation for Structured Data Question Answering [50.98882432829651]
Small-scale language models (LLMs) are prone to errors in generating structured queries.<n>We propose a self-correction distillation (SCD) method to improve the structured data QA ability of small-scale LLMs.
arXiv Detail & Related papers (2025-11-11T09:01:51Z)
Aligning Requirement for Large Language Model's Code Generation [9.205909320363247]
Specine is a novel specification alignment technique for large language models (LLMs) code generation.<n>Its key idea is to identify misaligned input specifications, lift LLM-perceived specifications, and align them to enhance the code generation performance of LLMs.<n>For example, Specine outperforms the most effective baseline, achieving an average improvement of 29.60% across all subjects in terms of Pass@1.
arXiv Detail & Related papers (2025-09-01T09:56:13Z)
Data Dependency-Aware Code Generation from Enhanced UML Sequence Diagrams [54.528185120850274]
We propose a novel step-by-step code generation framework named API2Dep.<n>First, we introduce an enhanced Unified Modeling Language (UML) API diagram tailored for service-oriented architectures.<n>Second, recognizing the critical role of data flow, we introduce a dedicated data dependency inference task.
arXiv Detail & Related papers (2025-08-05T12:28:23Z)
Grammar-Guided Evolutionary Search for Discrete Prompt Optimisation [63.97051732013936]
We propose an evolutionary search approach to automated discrete prompt optimisation consisting of two phases.<n>In the first phase, grammar-guided genetic programming is invoked to synthesise prompt-creating programmes.<n>In the second phase, local search is applied to explore the neighbourhoods of best-performing programmes.
arXiv Detail & Related papers (2025-07-14T14:34:15Z)
Specification-Guided Repair of Arithmetic Errors in Dafny Programs using LLMs [79.74676890436174]
We present an APR tool for Dafny that uses formal specifications as oracles for fault localization and repair.<n>We localize faults through a series of steps, which include using Hoare logic to determine the state of each statement within the program.<n>Our tool achieves 89.6% fault localization coverage and GPT-4o mini yields the highest repair success rate of 74.18%.
arXiv Detail & Related papers (2025-07-04T15:36:12Z)
LLM2: Let Large Language Models Harness System 2 Reasoning [65.89293674479907]
Large language models (LLMs) have exhibited impressive capabilities across a myriad of tasks, yet they occasionally yield undesirable outputs.<n>We introduce LLM2, a novel framework that combines an LLM with a process-based verifier.<n>LLMs2 is responsible for generating plausible candidates, while the verifier provides timely process-based feedback to distinguish desirable and undesirable outputs.
arXiv Detail & Related papers (2024-12-29T06:32:36Z)
SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts. We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM. We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z)
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions [72.56339136017759]
We introduce BigCodeBench, a benchmark that challenges Large Language Models (LLMs) to invoke multiple function calls as tools from 139 libraries and 7 domains for 1,140 fine-grained tasks.<n>Our evaluation shows that LLMs are not yet capable of following complex instructions to use function calls precisely, with scores up to 60%, significantly lower than the human performance of 97%.<n>We propose a natural-language-oriented variant of BigCodeBench, BigCodeBench-Instruct, that automatically transforms the original docstrings into short instructions only with essential information.
arXiv Detail & Related papers (2024-06-22T15:52:04Z)
Enchanting Program Specification Synthesis by Large Language Models using Static Analysis and Program Verification [15.686651364655958]
AutoSpec is an automated approach to synthesize specifications for automated program verification. It overcomes the shortcomings of existing work in specification versatility, synthesizing satisfiable and adequate specifications for full proof. It can be successfully applied to verify the programs in a real-world X509-parser project.
arXiv Detail & Related papers (2024-03-31T18:15:49Z)
SpecGen: Automated Generation of Formal Program Specifications via Large Language Models [20.36964281778921]
SpecGen is a novel technique for formal program specification generation based on Large Language Models.<n>We evaluate SpecGen on two datasets, including the SV-COMP 279 benchmark and a manually constructed dataset.
arXiv Detail & Related papers (2024-01-16T20:13:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.