Secure Parsing and Serializing with Separation Logic Applied to CBOR, CDDL, and COSE
- URL: http://arxiv.org/abs/2505.17335v2
- Date: Thu, 17 Jul 2025 15:50:49 GMT
- Title: Secure Parsing and Serializing with Separation Logic Applied to CBOR, CDDL, and COSE
- Authors: Tahina Ramananandro, Gabriel Ebner, Guido MartÃnez, Nikhil Swamy,
- Abstract summary: We present PulseParse, a library of verified and serializers for non-malleable binary formats.<n>We use PulseParse at scale by providing the first formalization of CBOR, a binary data format standard.<n>We show how to parse formats with only a constant amount of stack space.
- Score: 1.2419340000421064
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Incorrect handling of security-critical data formats, particularly in low-level languages, are the root cause of many security vulnerabilities. Provably correct parsing and serialization tools that target languages like C can help. Towards this end, we present PulseParse, a library of verified parser and serializer combinators for non-malleable binary formats. Specifications and proofs in PulseParse are in separation logic, offering a more abstract and compositional interface, with full support for data validation, parsing, and serialization. PulseParse also supports a class of recursive formats -- with a focus on security and handling adversarial inputs, we show how to parse such formats with only a constant amount of stack space. We use PulseParse at scale by providing the first formalization of CBOR, a recursive, binary data format standard, with growing adoption in various industrial standards. We prove that the deterministic fragment of CBOR is non-malleable and provide EverCBOR, a verified library in both C and Rust to validate, parse, and serialize CBOR objects implemented using PulseParse. Next, we provide the first formalization of CDDL, a schema definition language for CBOR. We identify well-formedness conditions on CDDL definitions that ensure that they yield unambiguous, non-malleable formats, and implement EverCDDL, a tool that checks that a CDDL definition is well-formed, and then produces verified parsers and serializers for it. To evaluate our work, we use EverCDDL to generate verified parsers and serializers for various security-critical applications. Notably, we build a formally verified implementation of COSE signing, a standard for cryptographically signed objects. We also use our toolchain to generate verified code for other standards specified in CDDL, including DICE Protection Environment, a secure boot protocol standard.
Related papers
- IFEvalCode: Controlled Code Generation [69.28317223249358]
The paper introduces forward and backward constraints generation to improve the instruction-following capabilities of Code LLMs.<n>The authors present IFEvalCode, a multilingual benchmark comprising 1.6K test samples across seven programming languages.
arXiv Detail & Related papers (2025-07-30T08:08:48Z) - Decompiling Smart Contracts with a Large Language Model [51.49197239479266]
Despite Etherscan's 78,047,845 smart contracts deployed on (as of May 26, 2025), a mere 767,520 ( 1%) are open source.<n>This opacity necessitates the automated semantic analysis of on-chain smart contract bytecode.<n>We introduce a pioneering decompilation pipeline that transforms bytecode into human-readable and semantically faithful Solidity code.
arXiv Detail & Related papers (2025-06-24T13:42:59Z) - Large Language Models for Validating Network Protocol Parsers [8.007994733372675]
Protocol standards are typically written in natural language, whereas implementations are in source code.<n>We propose PARVAL, a framework built on large language models (LLMs)<n>It transforms both protocol standards and their implementations into a unified intermediate representation, referred to as format specifications.<n>It successfully identifies inconsistencies between the implementation and its RFC standard, achieving a low false positive rate of 5.6%.
arXiv Detail & Related papers (2025-04-18T07:09:56Z) - Type-Constrained Code Generation with Language Models [51.03439021895432]
We introduce a type-constrained decoding approach that leverages type systems to guide code generation.<n>For this purpose, we develop novel prefix automata and a search over inhabitable types, forming a sound approach to enforce well-typedness on LLM-generated code.<n>Our approach reduces compilation errors by more than half and significantly increases functional correctness in code synthesis, translation, and repair tasks.
arXiv Detail & Related papers (2025-04-12T15:03:00Z) - Verified invertible lexer using regular expressions and DFAs [0.0]
In this project, we explore the concept of invertibility applied to serialisation and lexing frameworks.<n>While serialisation is the process of taking a data structure and writing it to a bit array, lexing is the process of reading a stream of characters and splitting them into tokens.
arXiv Detail & Related papers (2024-12-18T08:03:17Z) - Formally Verifiable Generated ASN.1/ACN Encoders and Decoders: A Case Study [0.0]
ASN.1 is a language for describing data structures widely employed in ground and space telecommunications.<n>ACN can be used along ASN.1 to describe complex binary formats and legacy protocols.<n>We show how to port an ASN.1/ACN code generator to generate Scala code.<n>We then enhance the generator to emit not only the executable code but also strong enough pre-conditions, post-conditions, and lemmas for inductive proofs.
arXiv Detail & Related papers (2024-12-10T06:54:42Z) - SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition [77.28814034644287]
We propose SVTRv2, a CTC model endowed with the ability to handle text irregularities and model linguistic context.<n>We extensively evaluate SVTRv2 in both standard and recent challenging benchmarks.<n> SVTRv2 surpasses most EDTRs across the scenarios in terms of accuracy and inference speed.
arXiv Detail & Related papers (2024-11-24T14:21:35Z) - StrTune: Data Dependence-based Code Slicing for Binary Similarity Detection with Fine-tuned Representation [5.41477941455399]
BCSD can address binary tasks such as malicious code snippets identification and binary patch analysis by comparing code patterns.
Because binaries are compiled with different compilation configurations, existing approaches still face notable limitations when comparing binary similarity.
We propose StrTune, which slices binary code based on data dependence and perform slice-level fine-tuning.
arXiv Detail & Related papers (2024-11-19T12:20:08Z) - 3DGen: AI-Assisted Generation of Provably Correct Binary Format Parsers [5.102523342662388]
3DGen is a framework that makes use of AI agents to transform mixed informal input into format specifications in a language called 3D.
3DGen produces a 3D specification that conforms to a test suite, and which yields safe, efficient, provably correct, parsing code in C.
arXiv Detail & Related papers (2024-04-16T07:53:28Z) - FoC: Figure out the Cryptographic Functions in Stripped Binaries with LLMs [51.898805184427545]
We propose a novel framework called FoC to Figure out the Cryptographic functions in stripped binaries.<n>We first build a binary large language model (FoC-BinLLM) to summarize the semantics of cryptographic functions in natural language.<n>We then build a binary code similarity model (FoC-Sim) upon the FoC-BinLLM to create change-sensitive representations and use it to retrieve similar implementations of unknown cryptographic functions in a database.
arXiv Detail & Related papers (2024-03-27T09:45:33Z) - Blind Evaluation Framework for Fully Homomorphic Encryption and Privacy-Preserving Machine Learning [0.0]
Blind Evaluation Framework (BEF) is a cryptographically secure programming framework.
It enables blind, but correct, execution of programming logic without Interactive Rounds of Decryption and Evaluation (IRDE)
This is the first framework to enable both training and inference of machine learning models with FHE without decryption rounds.
arXiv Detail & Related papers (2023-10-19T20:33:02Z) - Zero-Shot Cross-lingual Semantic Parsing [56.95036511882921]
We study cross-lingual semantic parsing as a zero-shot problem without parallel data for 7 test languages.
We propose a multi-task encoder-decoder model to transfer parsing knowledge to additional languages using only English-Logical form paired data.
Our system frames zero-shot parsing as a latent-space alignment problem and finds that pre-trained models can be improved to generate logical forms with minimal cross-lingual transfer penalty.
arXiv Detail & Related papers (2021-04-15T16:08:43Z) - Towards Instance-Level Parser Selection for Cross-Lingual Transfer of
Dependency Parsers [59.345145623931636]
We argue for a novel cross-lingual transfer paradigm: instance-level selection (ILPS)
We present a proof-of-concept study focused on instance-level selection in the framework of delexicalized transfer.
arXiv Detail & Related papers (2020-04-16T13:18:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.