Formally Verifiable Generated ASN.1/ACN Encoders and Decoders: A Case Study
- URL: http://arxiv.org/abs/2412.07235v1
- Date: Tue, 10 Dec 2024 06:54:42 GMT
- Title: Formally Verifiable Generated ASN.1/ACN Encoders and Decoders: A Case Study
- Authors: Mario Bucev, Samuel Chassot, Simon Felix, Filip Schramka, Viktor KunĨak,
- Abstract summary: ASN.1 is a language for describing data structures widely employed in ground and space telecommunications.<n>ACN can be used along ASN.1 to describe complex binary formats and legacy protocols.<n>We show how to port an ASN.1/ACN code generator to generate Scala code.<n>We then enhance the generator to emit not only the executable code but also strong enough pre-conditions, post-conditions, and lemmas for inductive proofs.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We propose a verified executable Scala backend for ASN1SCC, a compiler for ASN.1/ACN. ASN.1 is a language for describing data structures widely employed in ground and space telecommunications. ACN can be used along ASN.1 to describe complex binary formats and legacy protocols. To avoid error-prone and time-consuming manual writing of serializers, we show how to port an ASN.1/ACN code generator to generate Scala code. We then enhance the generator to emit not only the executable code but also strong enough pre-conditions, post-conditions, and lemmas for inductive proofs. This allowed us to verify the resulting generated annotated code using Stainless, a program verifier for Scala. The properties we prove include the absence of runtime errors, such as out-of-bound accesses or divisions by zero. For the base library, we also prove the invertibility of the decoding and encoding functions, showing that decoding yields the encoded value back. Furthermore, our system automatically inserts invertibility proofs for arbitrary records in the generated code, proving over 300'000 verification conditions. We establish key steps towards such proofs for sums and arrays as well.
Related papers
- IFEvalCode: Controlled Code Generation [69.28317223249358]
The paper introduces forward and backward constraints generation to improve the instruction-following capabilities of Code LLMs.<n>The authors present IFEvalCode, a multilingual benchmark comprising 1.6K test samples across seven programming languages.
arXiv Detail & Related papers (2025-07-30T08:08:48Z) - Neural Theorem Proving: Generating and Structuring Proofs for Formal Verification [0.4779196219827508]
We introduce a framework that generates whole proofs in a formal language to be used within systems that utilize the power of built-in tactics and off-the-shelf automated theorem provers.
To train the LLM, we employ a 2-stage finetuning process, where we first use SFT-based training to enable the model to generate syntactically correct Isabelle code.
We validate our framework using the miniF2F-test benchmark and the Isabelle proof assistant and design a use case to verify the correctness of the S3 bucket access policy code.
arXiv Detail & Related papers (2025-04-23T18:04:38Z) - Type-Constrained Code Generation with Language Models [51.03439021895432]
Large language models (LLMs) produce uncompilable output because their next-token inference procedure does not model formal aspects of code.
We introduce a type-constrained decoding approach that leverages type systems to guide code generation.
Our approach reduces compilation errors by more than half and increases functional correctness in code synthesis, translation, and repair tasks.
arXiv Detail & Related papers (2025-04-12T15:03:00Z) - KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for Coding [49.56049319037421]
KodCode is a synthetic dataset that addresses the persistent challenge of acquiring high-quality, verifiable training data.
It comprises question-solution-test triplets that are systematically validated via a self-verification procedure.
This pipeline yields a large-scale, robust and diverse coding dataset.
arXiv Detail & Related papers (2025-03-04T19:17:36Z) - Verified invertible lexer using regular expressions and DFAs [0.0]
In this project, we explore the concept of invertibility applied to serialisation and lexing frameworks.
While serialisation is the process of taking a data structure and writing it to a bit array, lexing is the process of reading a stream of characters and splitting them into tokens.
arXiv Detail & Related papers (2024-12-18T08:03:17Z) - Evaluating the Ability of Large Language Models to Generate Verifiable Specifications in VeriFast [5.019791860882564]
Large language models (LLMs) have shown promise in a number of software engineering activities.
This paper explores the effectiveness of OpenAI's GPT models in generating specifications based on separation logic.
arXiv Detail & Related papers (2024-11-04T17:44:11Z) - Decoding at the Speed of Thought: Harnessing Parallel Decoding of Lexical Units for LLMs [57.27982780697922]
Large language models have demonstrated exceptional capability in natural language understanding and generation.
However, their generation speed is limited by the inherently sequential nature of their decoding process.
This paper introduces Lexical Unit Decoding, a novel decoding methodology implemented in a data-driven manner.
arXiv Detail & Related papers (2024-05-24T04:35:13Z) - SparseCoder: Identifier-Aware Sparse Transformer for File-Level Code
Summarization [51.67317895094664]
This paper studies file-level code summarization, which can assist programmers in understanding and maintaining large source code projects.
We propose SparseCoder, an identifier-aware sparse transformer for effectively handling long code sequences.
arXiv Detail & Related papers (2024-01-26T09:23:27Z) - Using LLM such as ChatGPT for Designing and Implementing a RISC
Processor: Execution,Challenges and Limitations [11.07566083431614]
The paper reviews the associated steps such as parsing, tokenization, encoding, attention mechanism, sampling the tokens and iterations during code generation.
The generated code for the RISC components is verified through testbenches and hardware implementation on a FPGA board.
arXiv Detail & Related papers (2024-01-18T20:14:10Z) - VerilogEval: Evaluating Large Language Models for Verilog Code
Generation [6.88526119890374]
We present a comprehensive evaluation dataset consisting of 156 problems from the Verilog instructional website HDLBits.
The evaluation set consists of a diverse set of Verilog code generation tasks, ranging from simple combinational circuits to complex finite state machines.
arXiv Detail & Related papers (2023-09-14T09:15:34Z) - ALGO: Synthesizing Algorithmic Programs with LLM-Generated Oracle
Verifiers [60.6418431624873]
Large language models (LLMs) excel at implementing code from functionality descriptions but struggle with algorithmic problems.
We propose ALGO, a framework that synthesizes Algorithmic programs with LLM-Generated Oracles to guide the generation and verify their correctness.
Experiments show that when equipped with ALGO, we achieve an 8x better one-submission pass rate over the Codex model and a 2.6x better one-submission pass rate over CodeT.
arXiv Detail & Related papers (2023-05-24T00:10:15Z) - Neuro-symbolic Zero-Shot Code Cloning with Cross-Language Intermediate
Representation [13.881954273779403]
We define a neuro-symbolic approach to address the task of finding semantically similar clones for the codes of the legacy programming language, without training data.
We fine-tune UnixCoder, the best-performing model for cross-programming language search, for the Code Cloning task with the SBT IRs of C code-pairs, available in the CodeNet dataset.
With this fine-tuned UnixCoder, we get a performance improvement of 12.85 MAP@2 over the pre-trained UniXCoder model, in a zero-shot setting, on the test split synthesized from the CodeNet
arXiv Detail & Related papers (2023-04-26T07:41:26Z) - Python Code Generation by Asking Clarification Questions [57.63906360576212]
In this work, we introduce a novel and more realistic setup for this task.
We hypothesize that the under-specification of a natural language description can be resolved by asking clarification questions.
We collect and introduce a new dataset named CodeClarQA containing pairs of natural language descriptions and code with created synthetic clarification questions and answers.
arXiv Detail & Related papers (2022-12-19T22:08:36Z) - On Sparsifying Encoder Outputs in Sequence-to-Sequence Models [90.58793284654692]
We take Transformer as the testbed and introduce a layer of gates in-between the encoder and the decoder.
The gates are regularized using the expected value of the sparsity-inducing L0penalty.
We investigate the effects of this sparsification on two machine translation and two summarization tasks.
arXiv Detail & Related papers (2020-04-24T16:57:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.