Nanopass Back-Translation of Call-Return Trees for Mechanized Secure Compilation Proofs
- URL: http://arxiv.org/abs/2503.19609v2
- Date: Fri, 11 Apr 2025 13:02:37 GMT
- Title: Nanopass Back-Translation of Call-Return Trees for Mechanized Secure Compilation Proofs
- Authors: Jérémy Thibault, Joseph Lenormand, Catalin Hritcu,
- Abstract summary: Researchers aim to build secure compilation chains enforcing that if there is no attack a source context can mount against a source program.<n>For any attack a target context mounts against the compiled program one has to exhibit a source context mounting the same attack against the source program.<n>We describe a novel back-translation technique, which results in simpler proofs that can be more easily mechanized in a proof assistant.
- Score: 0.07499722271664144
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Researchers aim to build secure compilation chains enforcing that if there is no attack a source context can mount against a source program then there is also no attack an adversarial target context can mount against the compiled program. Proving that these compilation chains are secure is, however, challenging, and involves a non-trivial back-translation step: for any attack a target context mounts against the compiled program one has to exhibit a source context mounting the same attack against the source program. We describe a novel back-translation technique, which results in simpler proofs that can be more easily mechanized in a proof assistant. Given a finite set of finite trace prefixes, capturing the interaction recorded during an attack between a target context and the compiled program, we build a call-return tree that we back-translate into a source context producing the same trace prefixes. We use state in the generated source context to record the current location in the call-return tree. The back-translation is done in several small steps, each adding to the tree new information describing how the location should change depending on how the context regains control. To prove this back-translation correct we give semantics to every intermediate call-return tree language, using ghost state to store information and explicitly enforce execution invariants. We prove several small forward simulations, basically seeing the back-translation as a verified nanopass compiler. Thanks to this modular structure, we are able to mechanize this complex back-translation and its correctness proof in the Rocq prover without too much effort.
Related papers
- Misquoted No More: Securely Extracting F* Programs with IO [0.3848364262836075]
Shallow embeddings that use monads to represent effects are popular in proof-oriented languages.<n>Once shallowly embedded programs are verified, they are often extracted to mainstream languages like OCaml or C.<n>We build on this idea, but limit the use of translation validation to a first extraction step.
arXiv Detail & Related papers (2026-02-23T15:37:18Z) - Semantic Reconstruction of Adversarial Plagiarism: A Context-Aware Framework for Detecting and Restoring "Tortured Phrases" in Scientific Literature [4.905540561146363]
We propose Semantic Reconstruction of Adversarial Plagiarism (SRAP)<n>SRAP is a framework designed not only to detect these anomalies but to mathematically recover the original terminology.<n>We use a two-stage architecture: (1) statistical anomaly detection with a domain-specific masked language model (SciBERT) using token-level pseudo-perplexity, and (2) source-based semantic reconstruction using dense vector retrieval (FAISS) and sentence-level alignment (SBERT)
arXiv Detail & Related papers (2025-12-11T08:53:25Z) - Context-Guided Decompilation: A Step Towards Re-executability [50.71992919223209]
Binary decompilation plays an important role in software security analysis, reverse engineering and malware understanding.<n>Recent advances in large language models (LLMs) have enabled neural decompilation, but the generated code is typically only semantically plausible.<n>We propose ICL4Decomp, a hybrid decompilation framework that leverages in-context learning (ICL) to guide LLMs toward generating re-executable source code.
arXiv Detail & Related papers (2025-11-03T17:21:39Z) - Correctness-Guaranteed Code Generation via Constrained Decoding [11.531496728670746]
We present a constrained runtime decoding algorithm for generating semantically correct programs.<n>We show that our method can generate semantically correct programs conforming to any prescribed scripting API.<n>We further show that, with careful design, our semantic guarantees extend to correctness, as validated in the application of generating game mechanics for a roguelike video game.
arXiv Detail & Related papers (2025-08-20T20:48:18Z) - Exposing Go's Hidden Bugs: A Novel Concolic Framework [2.676686591720132]
We present Zorya, a novel methodology to evaluate Go programs comprehensively.<n>By systematically exploring execution paths to uncover vulnerabilities beyond conventional testing, symbolic execution offers distinct advantages.<n>Our solution employs Ghidra's P-Code as an intermediate representation (IR)
arXiv Detail & Related papers (2025-05-26T16:26:20Z) - ReF Decompile: Relabeling and Function Call Enhanced Decompile [50.86228893636785]
The goal of decompilation is to convert compiled low-level code (e.g., assembly code) back into high-level programming languages.<n>This task supports various reverse engineering applications, such as vulnerability identification, malware analysis, and legacy software migration.
arXiv Detail & Related papers (2025-02-17T12:38:57Z) - Syzygy: Dual Code-Test C to (safe) Rust Translation using LLMs and Dynamic Analysis [8.361424157571468]
Syzygy is an automated approach to translate C to safe Rust.
This is the largest automated and test-validated C to safe Rust code translation achieved so far.
arXiv Detail & Related papers (2024-12-18T18:55:46Z) - SNIP: Speculative Execution and Non-Interference Preservation for Compiler Transformations [0.15800607910450126]
We address the problem of preserving non-interference across compiler transformations under speculative semantics.
We develop a proof method that ensures the preservation uniformly across all source programs.
arXiv Detail & Related papers (2024-07-21T07:30:30Z) - Unsupervised Sign Language Translation and Generation [72.01216288379072]
We introduce an unsupervised sign language translation and generation network (USLNet)
USLNet learns from abundant single-modality (text and video) data without parallel sign language data.
We propose a sliding window method to address the issues of aligning variable-length text with video sequences.
arXiv Detail & Related papers (2024-02-12T15:39:05Z) - CRIL: A Concurrent Reversible Intermediate Language [0.0]
We present a reversible intermediate language with composing for translating a high-level concurrent programming language to another lower-level concurrent programming language, keeping reversibility.
We propose CRIL as an extension of RIL used by Mogensen for a functional reversible language, incorporating a multi-thread process invocation and the synchronization primitives based on the P-V operations.
arXiv Detail & Related papers (2023-09-13T20:52:54Z) - Outline, Then Details: Syntactically Guided Coarse-To-Fine Code
Generation [61.50286000143233]
ChainCoder is a program synthesis language model that generates Python code progressively.
A tailored transformer architecture is leveraged to jointly encode the natural language descriptions and syntactically aligned I/O data samples.
arXiv Detail & Related papers (2023-04-28T01:47:09Z) - Summarize and Generate to Back-translate: Unsupervised Translation of
Programming Languages [86.08359401867577]
Back-translation is widely known for its effectiveness for neural machine translation when little to no parallel data is available.
We propose performing back-translation via code summarization and generation.
We show that our proposed approach performs competitively with state-of-the-art methods.
arXiv Detail & Related papers (2022-05-23T08:20:41Z) - Reflective Decoding: Beyond Unidirectional Generation with Off-the-Shelf
Language Models [63.808843089941405]
Large pretrained LanguageModels (LMs) generate text with remarkable quality, but only sequentially from left to right.
We present Reflective Decoding, a novel unsupervised algorithm that allows for direct application of unidirectional LMs to non-sequential tasks.
Our 2-step approach requires no supervision or even parallel corpora, only two off-the-shelf pretrained LMs in opposite directions.
arXiv Detail & Related papers (2020-10-16T18:02:07Z) - Contrastive Code Representation Learning [95.86686147053958]
We show that the popular reconstruction-based BERT model is sensitive to source code edits, even when the edits preserve semantics.
We propose ContraCode: a contrastive pre-training task that learns code functionality, not form.
arXiv Detail & Related papers (2020-07-09T17:59:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.