Synthesizing Precise Protocol Specs from Natural Language for Effective Test Generation
- URL: http://arxiv.org/abs/2511.17977v1
- Date: Sat, 22 Nov 2025 08:39:52 GMT
- Title: Synthesizing Precise Protocol Specs from Natural Language for Effective Test Generation
- Authors: Kuangxiangzi Liu, Dhiman Chakraborty, Alexander Liggesmeyer, Andreas Zeller,
- Abstract summary: AutoSPEC recovers on average 92.8% of client and 80.2% of server message types, and 81.5% message acceptance across diverse real-world systems.<n>Prototype successfully demonstrated the feasibility of our approach on five widely used.<n>internet-based protocols.
- Score: 42.582977261473324
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Safety- and security-critical systems have to be thoroughly tested against their specifications. The state of practice is to have _natural language_ specifications, from which test cases are derived manually - a process that is slow, error-prone, and difficult to scale. _Formal_ specifications, on the other hand, are well-suited for automated test generation, but are tedious to write and maintain. In this work, we propose a two-stage pipeline that uses large language models (LLMs) to bridge the gap: First, we extract _protocol elements_ from natural-language specifications; second, leveraging a protocol implementation, we synthesize and refine a formal _protocol specification_ from these elements, which we can then use to massively test further implementations. We see this two-stage approach to be superior to end-to-end LLM-based test generation, as 1. it produces an _inspectable specification_ that preserves traceability to the original text; 2. the generation of actual test cases _no longer requires an LLM_; 3. the resulting formal specs are _human-readable_, and can be reviewed, version-controlled, and incrementally refined; and 4. over time, we can build a _corpus_ of natural-language-to-formal-specification mappings that can be used to further train and refine LLMs for more automatic translations. Our prototype, AUTOSPEC, successfully demonstrated the feasibility of our approach on five widely used _internet protocols_ (SMTP, POP3, IMAP, FTP, and ManageSieve) by applying its methods on their _RFC specifications_ written in natural-language, and the recent _I/O grammar_ formalism for protocol specification and fuzzing. In its evaluation, AUTOSPEC recovers on average 92.8% of client and 80.2% of server message types, and achieves 81.5% message acceptance across diverse, real-world systems.
Related papers
- Doc2Spec: Synthesizing Formal Programming Specifications from Natural Language via Grammar Induction [3.13621719351502]
Doc2Spec is a framework that automatically induces a specification grammar from natural-language rules.<n>It generates formal specifications guided by the induced grammar.<n>The grammar captures essential domain knowledge, constrains the specification space, and enforces consistent representations.
arXiv Detail & Related papers (2026-01-30T02:58:27Z) - FASTRIC: Prompt Specification Language for Verifiable LLM Interactions [3.8073142980732997]
Large Language Models (LLMs) execute complex multi-turn interaction protocols but lack formal specifications to verify execution against designer intent.<n>We introduce FASTRIC, a Prompt Specification Language that makes implicit Finite State Machines (FSMs) explicit in natural language prompts.
arXiv Detail & Related papers (2025-12-22T01:19:50Z) - Protocol Testing with I/O Grammars [45.68497486907946]
We propose a novel approach to protocol testing that combines input generation and output checking in a single framework.<n>We demonstrate that I/O grammars can specify advanced protocol features correctly and completely, while also enabling output validation of the programs under test.
arXiv Detail & Related papers (2025-09-24T16:41:04Z) - IFEvalCode: Controlled Code Generation [69.28317223249358]
The paper introduces forward and backward constraints generation to improve the instruction-following capabilities of Code LLMs.<n>The authors present IFEvalCode, a multilingual benchmark comprising 1.6K test samples across seven programming languages.
arXiv Detail & Related papers (2025-07-30T08:08:48Z) - iPanda: An LLM-based Agent for Automated Conformance Testing of Communication Protocols [11.749977502129898]
Large Language Models (LLMs) have demonstrated impressive text comprehension and code generation abilities.<n>We propose iPanda, the first framework that leverages LLMs to automate protocol conformance testing.<n>Experiments on various protocols show that iPanda significantly outperforms pure LLM-based approaches.
arXiv Detail & Related papers (2025-07-01T02:27:44Z) - ModelForge: Using GenAI to Improve the Development of Security Protocols [1.9241821314180376]
We introduce ModelForge, a novel tool that automates the translation of protocol specifications.<n>By leveraging advances in Natural Language Processing (NLP) and Generative AI (GenAI), ModelForge processes protocol specifications and generates a CPSA protocol definition.
arXiv Detail & Related papers (2025-06-08T06:27:09Z) - Validating Network Protocol Parsers with Traceable RFC Document Interpretation [11.081773172066766]
oracle and traceability problems determine when a protocol implementation can be considered buggy.<n>This work considers both and provides an effective solution using recent advances in large language models (LLMs)<n>We have extensively evaluated our approach using nine network protocols and their implementations written in C, Python, and Go.
arXiv Detail & Related papers (2025-04-25T03:39:19Z) - Large Language Models for Validating Network Protocol Parsers [8.007994733372675]
Protocol standards are typically written in natural language, whereas implementations are in source code.<n>We propose PARVAL, a framework built on large language models (LLMs)<n>It transforms both protocol standards and their implementations into a unified intermediate representation, referred to as format specifications.<n>It successfully identifies inconsistencies between the implementation and its RFC standard, achieving a low false positive rate of 5.6%.
arXiv Detail & Related papers (2025-04-18T07:09:56Z) - Native Language Identification with Large Language Models [60.80452362519818]
We show that GPT models are proficient at NLI classification, with GPT-4 setting a new performance record of 91.7% on the benchmark11 test set in a zero-shot setting.
We also show that unlike previous fully-supervised settings, LLMs can perform NLI without being limited to a set of known classes.
arXiv Detail & Related papers (2023-12-13T00:52:15Z) - Leveraging Large Language Models to Improve REST API Testing [51.284096009803406]
RESTGPT takes as input an API specification, extracts machine-interpretable rules, and generates example parameter values from natural-language descriptions in the specification.
Our evaluations indicate that RESTGPT outperforms existing techniques in both rule extraction and value generation.
arXiv Detail & Related papers (2023-12-01T19:53:23Z) - Automatic Extraction of Rules Governing Morphological Agreement [103.78033184221373]
We develop an automated framework for extracting a first-pass grammatical specification from raw text.
We focus on extracting rules describing agreement, a morphosyntactic phenomenon at the core of the grammars of many of the world's languages.
We apply our framework to all languages included in the Universal Dependencies project, with promising results.
arXiv Detail & Related papers (2020-10-02T18:31:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.