Related papers: Extracting Protocol Format as State Machine via Controlled Static Loop Analysis

Extracting Protocol Format as State Machine via Controlled Static Loop Analysis

URL: http://arxiv.org/abs/2305.13483v4
Date: Mon, 1 Jul 2024 11:43:28 GMT
Title: Extracting Protocol Format as State Machine via Controlled Static Loop Analysis
Authors: Qingkai Shi, Xiangzhe Xu, Xiangyu Zhang,
Abstract summary: This work focuses on a class of protocols whose formats are described via constraint-enhanced regular expressions and parsed using finite-state machines. Our new technique extracts a state machine by regarding each loop as a state and the dependency between loop iterations as state transitions. The evaluation results show that we can infer a state machine and, thus, the message formats, in five minutes with over 90% precision and recall.
Score: 14.201174164060994
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reverse engineering of protocol message formats is critical for many security applications. Mainstream techniques use dynamic analysis and inherit its low-coverage problem -- the inferred message formats only reflect the features of their inputs. To achieve high coverage, we choose to use static analysis to infer message formats from the implementation of protocol parsers. In this work, we focus on a class of extremely challenging protocols whose formats are described via constraint-enhanced regular expressions and parsed using finite-state machines. Such state machines are often implemented as complicated parsing loops, which are inherently difficult to analyze via conventional static analysis. Our new technique extracts a state machine by regarding each loop iteration as a state and the dependency between loop iterations as state transitions. To achieve high, i.e., path-sensitive, precision but avoid path explosion, the analysis is controlled to merge as many paths as possible based on carefully-designed rules. The evaluation results show that we can infer a state machine and, thus, the message formats, in five minutes with over 90% precision and recall, far better than state of the art. We also applied the state machines to enhance protocol fuzzers, which are improved by 20% to 230% in terms of coverage and detect ten more zero-days compared to baselines.

Related papers

Fast Controlled Generation from Language Models with Adaptive Weighted Rejection Sampling [90.86991492288487]
evaluating constraint on every token can be prohibitively expensive. LCD can distort the global distribution over strings, sampling tokens based only on local information. We show that our approach is superior to state-of-the-art baselines.
arXiv Detail & Related papers (2025-04-07T18:30:18Z)
PPT: Pretraining with Pseudo-Labeled Trajectories for Motion Forecasting [90.47748423913369]
State-of-the-art motion forecasting models rely on large curated datasets with manually annotated or heavily post-processed trajectories. PWT is a simple and scalable alternative that uses unprocessed and diverse trajectories automatically generated from off-the-shelf 3D detectors and tracking. It achieves strong performance across standard benchmarks particularly in low-data regimes, and in cross-domain, end-to-end and multi-class settings.
arXiv Detail & Related papers (2024-12-09T13:48:15Z)
Scaling Symbolic Execution to Large Software Systems [0.0]
Symbolic execution is a popular static analysis technique used both in program verification and in bug detection software. We focus on an error finding framework called the Clang Static Analyzer, and the infrastructure built around it named CodeChecker.
arXiv Detail & Related papers (2024-08-04T02:54:58Z)
Bisimulation Learning [55.859538562698496]
We compute finite bisimulations of state transition systems with large, possibly infinite state space. Our technique yields faster verification results than alternative state-of-the-art tools in practice.
arXiv Detail & Related papers (2024-05-24T17:11:27Z)
Sparse Autoencoders Enable Scalable and Reliable Circuit Identification in Language Models [0.0]
This paper introduces an efficient and robust method for discovering interpretable circuits in large language models. We propose training sparse autoencoders on carefully designed positive and negative examples. Our findings highlight the promise of discrete sparse autoencoders for scalable and efficient mechanistic interpretability.
arXiv Detail & Related papers (2024-05-21T06:26:10Z)
Inferring State Machine from the Protocol Implementation via Large Language Model [18.942047454890847]
We propose an innovative state machine inference approach powered by Large Language Models (LLMs) Our evaluation across six protocol implementations demonstrates the method's high efficacy, achieving an accuracy rate exceeding 90%. Our proposed method not only marks a significant step forward in accurate state machine inference but also opens new avenues for improving the security and reliability of protocol implementations.
arXiv Detail & Related papers (2024-05-01T08:46:36Z)
Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration [54.897493351694195]
We propose a novel parallel decoding approach, namely textithidden transfer, which decodes multiple successive tokens simultaneously in a single forward pass. In terms of acceleration metrics, we outperform all the single-model acceleration techniques, including Medusa and Self-Speculative decoding.
arXiv Detail & Related papers (2024-04-18T09:17:06Z)
DT-SIM: Property-Based Testing for MPC Security [2.0308771704846245]
Property-based testing is effective for detecting security bugs in secure protocols. We specifically target Secure Multi-Party Computation (MPC) We devise a test that can detect various flaws in a bit-level implementation of an MPC protocol.
arXiv Detail & Related papers (2024-03-08T02:02:24Z)
Data post-processing for the one-way heterodyne protocol under composable finite-size security [62.997667081978825]
We study the performance of a practical continuous-variable (CV) quantum key distribution protocol. We focus on the Gaussian-modulated coherent-state protocol with heterodyne detection in a high signal-to-noise ratio regime. This allows us to study the performance for practical implementations of the protocol and optimize the parameters connected to the steps above.
arXiv Detail & Related papers (2022-05-20T12:37:09Z)
Composably secure data processing for Gaussian-modulated continuous variable quantum key distribution [58.720142291102135]
Continuous-variable quantum key distribution (QKD) employs the quadratures of a bosonic mode to establish a secret key between two remote parties. We consider a protocol with homodyne detection in the general setting of composable finite-size security. In particular, we analyze the high signal-to-noise regime which requires the use of high-rate (non-binary) low-density parity check codes.
arXiv Detail & Related papers (2021-03-30T18:02:55Z)
Optimizing the Decoy-State BB84 QKD Protocol Parameters [3.6954802719347413]
The performance of a QKD implementation is determined by the tightness of the underlying security analysis. It is known that optimal protocol parameters, such as the number of decoy states and their intensities, can be found by solving a nonlinear optimization problem. We show an improved performance for the Decoy-State BB84 QKD protocol, demonstrating that the assumptions typically made are too restrictive.
arXiv Detail & Related papers (2020-06-29T12:06:16Z)
End-to-End Object Detection with Transformers [88.06357745922716]
We present a new method that views object detection as a direct set prediction problem. Our approach streamlines the detection pipeline, effectively removing the need for many hand-designed components. The main ingredients of the new framework, called DEtection TRansformer or DETR, are a set-based global loss.
arXiv Detail & Related papers (2020-05-26T17:06:38Z)
Self-Supervised Log Parsing [59.04636530383049]
Large-scale software systems generate massive volumes of semi-structured log records. Existing approaches rely on log-specifics or manual rule extraction. We propose NuLog that utilizes a self-supervised learning model and formulates the parsing task as masked language modeling.
arXiv Detail & Related papers (2020-03-17T19:25:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.