Related papers: An LLM Agent for Functional Bug Detection in Network Protocols

An LLM Agent for Functional Bug Detection in Network Protocols

URL: http://arxiv.org/abs/2506.00714v1
Date: Sat, 31 May 2025 21:13:19 GMT
Title: An LLM Agent for Functional Bug Detection in Network Protocols
Authors: Mingwei Zheng, Chengpeng Wang, Xuwei Liu, Jinyao Guo, Shiwei Feng, Xiangyu Zhang,
Abstract summary: RFCScan is an autonomous agent that leverages large language models (LLMs) to detect functional bugs.<n>Inspired by the human auditing procedure, RFCScan comprises two key components: an indexing agent and a detection agent.<n> RFCScan identifies 47 functional bugs with 81.9% precision, of which 20 bugs have been confirmed or fixed by developers.
Score: 7.803606403370593
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Functional correctness is critical for ensuring the reliability and security of network protocol implementations. Functional bugs, instances where implementations diverge from behaviors specified in RFC documents, can lead to severe consequences, including faulty routing, authentication bypasses, and service disruptions. Detecting these bugs requires deep semantic analysis across specification documents and source code, a task beyond the capabilities of traditional static analysis tools. This paper introduces RFCScan, an autonomous agent that leverages large language models (LLMs) to detect functional bugs by checking conformance between network protocol implementations and their RFC specifications. Inspired by the human auditing procedure, RFCScan comprises two key components: an indexing agent and a detection agent. The former hierarchically summarizes protocol code semantics, generating semantic indexes that enable the detection agent to narrow down the scanning scope. The latter employs demand-driven retrieval to iteratively collect additional relevant data structures and functions, eventually identifying potential inconsistencies with the RFC specifications effectively. We evaluate RFCScan across six real-world network protocol implementations. RFCScan identifies 47 functional bugs with 81.9% precision, of which 20 bugs have been confirmed or fixed by developers.

Related papers

Decompiling Smart Contracts with a Large Language Model [51.49197239479266]
Despite Etherscan's 78,047,845 smart contracts deployed on (as of May 26, 2025), a mere 767,520 ( 1%) are open source.<n>This opacity necessitates the automated semantic analysis of on-chain smart contract bytecode.<n>We introduce a pioneering decompilation pipeline that transforms bytecode into human-readable and semantically faithful Solidity code.
arXiv Detail & Related papers (2025-06-24T13:42:59Z)
CANTXSec: A Deterministic Intrusion Detection and Prevention System for CAN Bus Monitoring ECU Activations [53.036288487863786]
We propose CANTXSec, the first deterministic Intrusion Detection and Prevention system based on physical ECU activations.<n>It detects and prevents classical attacks in the CAN bus, while detecting advanced attacks that have been less investigated in the literature.<n>We prove the effectiveness of our solution on a physical testbed, where we achieve 100% detection accuracy in both classes of attacks while preventing 100% of FIAs.
arXiv Detail & Related papers (2025-05-14T13:37:07Z)
Validating Network Protocol Parsers with Traceable RFC Document Interpretation [11.081773172066766]
oracle and traceability problems determine when a protocol implementation can be considered buggy.<n>This work considers both and provides an effective solution using recent advances in large language models (LLMs)<n>We have extensively evaluated our approach using nine network protocols and their implementations written in C, Python, and Go.
arXiv Detail & Related papers (2025-04-25T03:39:19Z)
Large Language Models for Validating Network Protocol Parsers [8.007994733372675]
Protocol standards are typically written in natural language, whereas implementations are in source code.<n>We propose PARVAL, a framework built on large language models (LLMs)<n>It transforms both protocol standards and their implementations into a unified intermediate representation, referred to as format specifications.<n>It successfully identifies inconsistencies between the implementation and its RFC standard, achieving a low false positive rate of 5.6%.
arXiv Detail & Related papers (2025-04-18T07:09:56Z)
Benchmarking LLMs and LLM-based Agents in Practical Vulnerability Detection for Code Repositories [8.583591493627276]
We introduce JitVul, a vulnerability detection benchmark linking each function to its vulnerability-introducing and fixing commits.<n>We show that ReAct Agents, leveraging thought-action-observation and interprocedural context, perform better than LLMs in distinguishing vulnerable from benign code.
arXiv Detail & Related papers (2025-03-05T15:22:24Z)
AutoRestTest: A Tool for Automated REST API Testing Using LLMs and MARL [46.65963514391019]
AutoRestTest is a novel tool that integrates the Semantic Property Dependency Graph (SPDG) with Multi-Agent Reinforcement Learning (MARL) and large language models (LLMs) for effective REST API testing.
arXiv Detail & Related papers (2025-01-15T05:54:33Z)
CryptoFormalEval: Integrating LLMs and Formal Verification for Automated Cryptographic Protocol Vulnerability Detection [41.94295877935867]
We introduce a benchmark to assess the ability of Large Language Models to autonomously identify vulnerabilities in new cryptographic protocols. We created a dataset of novel, flawed, communication protocols and designed a method to automatically verify the vulnerabilities found by the AI agents.
arXiv Detail & Related papers (2024-11-20T14:16:55Z)
Interactive Tools Substantially Assist LM Agents in Finding Security Vulnerabilities [46.34031902647788]
We present EnIGMA, an LM agent for autonomously solving Capture The Flag (CTF) challenges.<n>We introduce new tools and interfaces to improve the agent's ability to find and exploit security vulnerabilities.<n> Empirical analysis on 390 CTF challenges demonstrate that these new tools and interfaces substantially improve our agent's performance.
arXiv Detail & Related papers (2024-09-24T15:06:01Z)
Renormalized Connection for Scale-preferred Object Detection in Satellite Imagery [51.83786195178233]
We design a Knowledge Discovery Network (KDN) to implement the renormalization group theory in terms of efficient feature extraction. Renormalized connection (RC) on the KDN enables synergistic focusing'' of multi-scale features. RCs extend the multi-level feature's divide-and-conquer'' mechanism of the FPN-based detectors to a wide range of scale-preferred tasks.
arXiv Detail & Related papers (2024-09-09T13:56:22Z)
DT-SIM: Property-Based Testing for MPC Security [2.0308771704846245]
Property-based testing is effective for detecting security bugs in secure protocols. We specifically target Secure Multi-Party Computation (MPC) We devise a test that can detect various flaws in a bit-level implementation of an MPC protocol.
arXiv Detail & Related papers (2024-03-08T02:02:24Z)
Exploring Sequence Feature Alignment for Domain Adaptive Detection Transformers [141.70707071815653]
We propose a novel Sequence Feature Alignment (SFA) method that is specially designed for the adaptation of detection transformers. SFA consists of a domain query-based feature alignment (DQFA) module and a token-wise feature alignment (TDA) module. Experiments on three challenging benchmarks show that SFA outperforms state-of-the-art domain adaptive object detection methods.
arXiv Detail & Related papers (2021-07-27T07:17:12Z)
Enabling certification of verification-agnostic networks via memory-efficient semidefinite programming [97.40955121478716]
We propose a first-order dual SDP algorithm that requires memory only linear in the total number of network activations. We significantly improve L-inf verified robust accuracy from 1% to 88% and 6% to 40% respectively. We also demonstrate tight verification of a quadratic stability specification for the decoder of a variational autoencoder.
arXiv Detail & Related papers (2020-10-22T12:32:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.