Misquoted No More: Securely Extracting F* Programs with IO
- URL: http://arxiv.org/abs/2602.19973v1
- Date: Mon, 23 Feb 2026 15:37:18 GMT
- Title: Misquoted No More: Securely Extracting F* Programs with IO
- Authors: Cezar-Constantin Andrici, Abigail Pribisova, Danel Ahman, Catalin Hritcu, Exequiel Rivas, Théo Winterhalter,
- Abstract summary: Shallow embeddings that use monads to represent effects are popular in proof-oriented languages.<n>Once shallowly embedded programs are verified, they are often extracted to mainstream languages like OCaml or C.<n>We build on this idea, but limit the use of translation validation to a first extraction step.
- Score: 0.3848364262836075
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Shallow embeddings that use monads to represent effects are popular in proof-oriented languages because they are convenient for formal verification. Once shallowly embedded programs are verified, they are often extracted to mainstream languages like OCaml or C and linked into larger codebases. The extraction process is not fully verified because it often involves quotation -- turning the shallowly embedded program into a deeply embedded one -- and verifying quotation remains a major open challenge. Instead, some prior work obtains formal correctness guarantees using translation validation to certify individual extraction results. We build on this idea, but limit the use of translation validation to a first extraction step that we call relational quotation and that uses a metaprogram to construct a typing derivation for the given shallowly embedded program. This metaprogram is simple, since the typing derivation follows the structure of the original program. Once we validate, syntactically, that the typing derivation is valid for the original program, we pass it to a verified syntax-generation function that produces code guaranteed to be semantically related to the original program. We apply this general idea to build SEIO*, a framework for extracting shallowly embedded F* programs with IO to a deeply embedded lambda-calculus while providing formal secure compilation guarantees. Using two cross-language logical relations, we devise a machine-checked proof in F* that SEIO* guarantees Robust Relational Hyperproperty Preservation (RrHP), a very strong secure compilation criterion that implies full abstraction as well as preservation of trace properties and hyperproperties against arbitrary adversarial contexts. This goes beyond the state of the art in verified and certifying extraction, which so far has focused on correctness rather than security.
Related papers
- BRIDGE: Building Representations In Domain Guided Program Verification [67.36686119518441]
BRIDGE decomposes verification into three interconnected domains: Code, Specifications, and Proofs.<n>We show that this approach substantially improves both accuracy and efficiency beyond standard error feedback methods.
arXiv Detail & Related papers (2025-11-26T06:39:19Z) - zkStruDul: Programming zkSNARKs with Structural Duality [0.2449909275410287]
zkStruDul is a language that unifies input transformations and predicate definitions into a single combined abstraction.<n>We provide a source-level semantics and prove its behavior is identical to the projected semantics, allowing straightforward standard reasoning.
arXiv Detail & Related papers (2025-11-13T18:06:21Z) - SLICET5: Static Program Slicing using Language Models with Copy Mechanism and Constrained Decoding [13.61350801915956]
Static program slicing is a fundamental technique in software engineering.<n>ourtool is a novel slicing framework that reformulates static program slicing as a sequence-to-sequence task.<n>ourtool consistently outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2025-09-22T03:14:47Z) - Type-Constrained Code Generation with Language Models [51.03439021895432]
We introduce a type-constrained decoding approach that leverages type systems to guide code generation.<n>For this purpose, we develop novel prefix automata and a search over inhabitable types, forming a sound approach to enforce well-typedness on LLM-generated code.<n>Our approach reduces compilation errors by more than half and significantly increases functional correctness in code synthesis, translation, and repair tasks.
arXiv Detail & Related papers (2025-04-12T15:03:00Z) - Evaluating LLM-driven User-Intent Formalization for Verification-Aware Languages [6.0608817611709735]
We propose a metric for evaluating the quality of specifications for verification-aware languages.
We show that our metric agrees closely on a human-labeled dataset of Dafny specifications for the popular MBPP code-generation benchmark.
We also outline formal verification challenges that need to be addressed to apply the technique more widely.
arXiv Detail & Related papers (2024-06-14T06:52:08Z) - FoC: Figure out the Cryptographic Functions in Stripped Binaries with LLMs [51.898805184427545]
We propose a novel framework called FoC to Figure out the Cryptographic functions in stripped binaries.<n>We first build a binary large language model (FoC-BinLLM) to summarize the semantics of cryptographic functions in natural language.<n>We then build a binary code similarity model (FoC-Sim) upon the FoC-BinLLM to create change-sensitive representations and use it to retrieve similar implementations of unknown cryptographic functions in a database.
arXiv Detail & Related papers (2024-03-27T09:45:33Z) - SECOMP: Formally Secure Compilation of Compartmentalized C Programs [2.5553752304478574]
Undefined behavior in C often causes devastating security vulnerabilities.<n>We introduce SECOMP, a compiler for compartmentalized C code with machine-checked proofs.<n>This is the first time such a strong criterion is proven for a mainstream programming language.
arXiv Detail & Related papers (2024-01-29T16:32:36Z) - BOOST: Harnessing Black-Box Control to Boost Commonsense in LMs'
Generation [60.77990074569754]
We present a computation-efficient framework that steers a frozen Pre-Trained Language Model towards more commonsensical generation.
Specifically, we first construct a reference-free evaluator that assigns a sentence with a commonsensical score.
We then use the scorer as the oracle for commonsense knowledge, and extend the controllable generation method called NADO to train an auxiliary head.
arXiv Detail & Related papers (2023-10-25T23:32:12Z) - LINC: A Neurosymbolic Approach for Logical Reasoning by Combining
Language Models with First-Order Logic Provers [60.009969929857704]
Logical reasoning is an important task for artificial intelligence with potential impacts on science, mathematics, and society.
In this work, we reformulating such tasks as modular neurosymbolic programming, which we call LINC.
We observe significant performance gains on FOLIO and a balanced subset of ProofWriter for three different models in nearly all experimental conditions we evaluate.
arXiv Detail & Related papers (2023-10-23T17:58:40Z) - Interactive Code Generation via Test-Driven User-Intent Formalization [60.90035204567797]
Large language models (LLMs) produce code from informal natural language (NL) intent.
It is hard to define a notion of correctness since natural language can be ambiguous and lacks a formal semantics.
We describe a language-agnostic abstract algorithm and a concrete implementation TiCoder.
arXiv Detail & Related papers (2022-08-11T17:41:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.