Related papers: Large Language Models for Validating Network Protocol Parsers

Large Language Models for Validating Network Protocol Parsers

URL: http://arxiv.org/abs/2504.13515v1
Date: Fri, 18 Apr 2025 07:09:56 GMT
Title: Large Language Models for Validating Network Protocol Parsers
Authors: Mingwei Zheng, Danning Xie, Xiangyu Zhang,
Abstract summary: Protocol standards are typically written in natural language, whereas implementations are in source code.<n>We propose PARVAL, a framework built on large language models (LLMs)<n>It transforms both protocol standards and their implementations into a unified intermediate representation, referred to as format specifications.<n>It successfully identifies inconsistencies between the implementation and its RFC standard, achieving a low false positive rate of 5.6%.
Score: 8.007994733372675
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Network protocol parsers are essential for enabling correct and secure communication between devices. Bugs in these parsers can introduce critical vulnerabilities, including memory corruption, information leakage, and denial-of-service attacks. An intuitive way to assess parser correctness is to compare the implementation with its official protocol standard. However, this comparison is challenging because protocol standards are typically written in natural language, whereas implementations are in source code. Existing methods like model checking, fuzzing, and differential testing have been used to find parsing bugs, but they either require significant manual effort or ignore the protocol standards, limiting their ability to detect semantic violations. To enable more automated validation of parser implementations against protocol standards, we propose PARVAL, a multi-agent framework built on large language models (LLMs). PARVAL leverages the capabilities of LLMs to understand both natural language and code. It transforms both protocol standards and their implementations into a unified intermediate representation, referred to as format specifications, and performs a differential comparison to uncover inconsistencies. We evaluate PARVAL on the Bidirectional Forwarding Detection (BFD) protocol. Our experiments demonstrate that PARVAL successfully identifies inconsistencies between the implementation and its RFC standard, achieving a low false positive rate of 5.6%. PARVAL uncovers seven unique bugs, including five previously unknown issues.

Related papers

Validating Network Protocol Parsers with Traceable RFC Document Interpretation [11.081773172066766]
oracle and traceability problems determine when a protocol implementation can be considered buggy. This work considers both and provides an effective solution using recent advances in large language models (LLMs) We have extensively evaluated our approach using nine network protocols and their implementations written in C, Python, and Go.
arXiv Detail & Related papers (2025-04-25T03:39:19Z)
DT-SIM: Property-Based Testing for MPC Security [2.0308771704846245]
Property-based testing is effective for detecting security bugs in secure protocols. We specifically target Secure Multi-Party Computation (MPC) We devise a test that can detect various flaws in a bit-level implementation of an MPC protocol.
arXiv Detail & Related papers (2024-03-08T02:02:24Z)
Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval [139.21955930418815]
Cross-modal Retrieval methods build similarity relations between vision and language modalities by jointly learning a common representation space. However, the predictions are often unreliable due to the Aleatoric uncertainty, which is induced by low-quality data, e.g., corrupt images, fast-paced videos, and non-detailed texts. We propose a novel Prototype-based Aleatoric Uncertainty Quantification (PAU) framework to provide trustworthy predictions by quantifying the uncertainty arisen from the inherent data ambiguity.
arXiv Detail & Related papers (2023-09-29T09:41:19Z)
Bicoptor 2.0: Addressing Challenges in Probabilistic Truncation for Enhanced Privacy-Preserving Machine Learning [6.733212399517445]
This paper focuses on analyzing the problems and proposing solutions for the probabilistic truncation protocol in existing PPML works. In terms of accuracy, we reveal that precision selections recommended in some of the existing works are incorrect. We propose a solution and a precision selection guideline for future works.
arXiv Detail & Related papers (2023-09-10T01:43:40Z)
Invalidator: Automated Patch Correctness Assessment via Semantic and Syntactic Reasoning [6.269370220586248]
In this paper, we propose a novel technique to automatically assess the correctness of APR-generated patches via semantic and syntactic reasoning. We have conducted experiments on a dataset of 885 patches generated on real-world programs in Defects4J. Experiment results show that INVALIDATOR correctly classified 79% overfitting patches, accounting for 23% more overfitting patches being detected by the best baseline.
arXiv Detail & Related papers (2023-01-03T14:16:32Z)
Probing for targeted syntactic knowledge through grammatical error detection [13.653209309144593]
We propose grammatical error detection as a diagnostic probe to evaluate pre-trained English language models. We leverage public annotated training data from both English second language learners and Wikipedia edits. We find that masked language models linearly encode information relevant to the detection of SVA errors, while the autoregressive models perform on par with our baseline.
arXiv Detail & Related papers (2022-10-28T16:01:25Z)
BenchCLAMP: A Benchmark for Evaluating Language Models on Syntactic and Semantic Parsing [55.058258437125524]
We introduce BenchCLAMP, a Benchmark to evaluate Constrained LAnguage Model Parsing. We benchmark eight language models, including two GPT-3 variants available only through an API. Our experiments show that encoder-decoder pretrained language models can achieve similar performance or surpass state-of-the-art methods for syntactic and semantic parsing when the model output is constrained to be valid.
arXiv Detail & Related papers (2022-06-21T18:34:11Z)
Improving Pre-trained Language Models with Syntactic Dependency Prediction Task for Chinese Semantic Error Recognition [52.55136323341319]
Existing Chinese text error detection mainly focuses on spelling and simple grammatical errors. Chinese semantic errors are understudied and more complex that humans cannot easily recognize.
arXiv Detail & Related papers (2022-04-15T13:55:32Z)
Zero-Shot Cross-lingual Semantic Parsing [56.95036511882921]
We study cross-lingual semantic parsing as a zero-shot problem without parallel data for 7 test languages. We propose a multi-task encoder-decoder model to transfer parsing knowledge to additional languages using only English-Logical form paired data. Our system frames zero-shot parsing as a latent-space alignment problem and finds that pre-trained models can be improved to generate logical forms with minimal cross-lingual transfer penalty.
arXiv Detail & Related papers (2021-04-15T16:08:43Z)
On the Robustness of Language Encoders against Grammatical Errors [66.05648604987479]
We collect real grammatical errors from non-native speakers and conduct adversarial attacks to simulate these errors on clean text data. Results confirm that the performance of all tested models is affected but the degree of impact varies.
arXiv Detail & Related papers (2020-05-12T11:01:44Z)
Towards Instance-Level Parser Selection for Cross-Lingual Transfer of Dependency Parsers [59.345145623931636]
We argue for a novel cross-lingual transfer paradigm: instance-level selection (ILPS) We present a proof-of-concept study focused on instance-level selection in the framework of delexicalized transfer.
arXiv Detail & Related papers (2020-04-16T13:18:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.