Enhancing Translation Validation of Compiler Transformations with Large
Language Models
- URL: http://arxiv.org/abs/2401.16797v2
- Date: Thu, 1 Feb 2024 00:19:32 GMT
- Title: Enhancing Translation Validation of Compiler Transformations with Large
Language Models
- Authors: Yanzhao Wang, Fei Xie
- Abstract summary: This paper presents a framework that integrates Large Language Models (LLMs) into translation validation.
We first utilize existing formal verification tools for translation validation.
When formal verification tools are unable to confirm a transformation's soundness, our framework employs fine-tuned LLMs for prediction.
- Score: 5.103162976634333
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents a framework that integrates Large Language Models (LLMs)
into translation validation, targeting LLVM compiler transformations where
formal verification tools fall short. Our framework first utilizes existing
formal verification tools for translation validation. In this work, we use
Alive2, a well-known tool in LLVM compiler verification, as an example. When
formal verification tools are unable to confirm a transformation's soundness,
our framework employs fine-tuned LLMs for prediction. It then applies fuzzing
to transformations predicted as potentially unsound by the LLMs due to return
values or memory inconsistencies, aiming to find counterexamples. In cases
where transformations are unsound for other reasons or sound, or if no
counterexamples emerge, the framework directly reports these outcomes without
further fuzzing. This methodology has shown effectiveness in complex
application such as deep-learning accelerator designs, where traditional formal
verification tools struggle.
Related papers
- CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward [50.97588334916863]
We develop CompassVerifier, an accurate and robust lightweight verifier model for evaluation and outcome reward.<n>It demonstrates multi-domain competency spanning math, knowledge, and diverse reasoning tasks, with the capability to process various answer types.<n>We introduce VerifierBench benchmark comprising model outputs collected from multiple data sources, augmented through manual analysis of metaerror patterns to enhance CompassVerifier.
arXiv Detail & Related papers (2025-08-05T17:55:24Z) - The Unreasonable Effectiveness of Model Merging for Cross-Lingual Transfer in LLMs [54.59207567677249]
Large language models (LLMs) still struggle across tasks outside of high-resource languages.<n>In this work, we investigate cross-lingual transfer to lower-resource languages where task-specific post-training data is scarce.
arXiv Detail & Related papers (2025-05-23T20:28:31Z) - Enhancing Large Language Models with Faster Code Preprocessing for Vulnerability Detection [0.0]
We build on the existing SCoPE framework and introduce SCoPE2, an enhanced version with improved performance.<n>Our results show a 97.3% reduction in processing time with SCoPE2, along with an improved F1-score for the Large Language Model (LLM) for vulnerability detection.
arXiv Detail & Related papers (2025-05-08T19:00:11Z) - APE-Bench I: Towards File-level Automated Proof Engineering of Formal Math Libraries [5.227446378450704]
APE-Bench I is the first realistic benchmark built from real-world commit histories of Mathlib4.
Eleanstic is a scalable parallel verification infrastructure optimized for proof checking across multiple versions of Mathlib.
arXiv Detail & Related papers (2025-04-27T05:04:02Z) - Neural Theorem Proving: Generating and Structuring Proofs for Formal Verification [0.4779196219827508]
We introduce a framework that generates whole proofs in a formal language to be used within systems that utilize the power of built-in tactics and off-the-shelf automated theorem provers.
To train the LLM, we employ a 2-stage finetuning process, where we first use SFT-based training to enable the model to generate syntactically correct Isabelle code.
We validate our framework using the miniF2F-test benchmark and the Isabelle proof assistant and design a use case to verify the correctness of the S3 bucket access policy code.
arXiv Detail & Related papers (2025-04-23T18:04:38Z) - Training of Scaffolded Language Models with Language Supervision: A Survey [62.59629932720519]
This survey organizes the literature on the design and optimization of emerging structures around post-trained LMs.<n>We refer to this overarching structure as scaffolded LMs and focus on LMs that are integrated into multi-step processes with tools.
arXiv Detail & Related papers (2024-10-21T18:06:25Z) - Strategies for Improving NL-to-FOL Translation with LLMs: Data Generation, Incremental Fine-Tuning, and Verification [9.36179617282876]
We create a high-quality FOL-annotated subset of ProofWriter dataset using GPT-4o.
Our results show state-of-the-art performance for ProofWriter and ProntoQA datasets using ProofFOL on LLaMA-2 and Mistral models.
arXiv Detail & Related papers (2024-09-24T21:24:07Z) - Using Large Language Models for the Interpretation of Building Regulations [7.013802453969655]
Large language models (LLMs) can generate logically coherent text and source code responding to user prompts.
This paper evaluates the performance of LLMs in translating building regulations into LegalRuleML in a few-shot learning setup.
arXiv Detail & Related papers (2024-07-26T08:30:47Z) - An Empirical Evaluation of Pre-trained Large Language Models for Repairing Declarative Formal Specifications [5.395614997568524]
This paper presents a systematic investigation into the capacity of Large Language Models (LLMs) for repairing declarative specifications in Alloy.
We propose a novel repair pipeline that integrates a dual-agent LLM framework, comprising a Repair Agent and a Prompt Agent.
Our study reveals that LLMs, particularly GPT-4 variants, outperform existing techniques in terms of repair efficacy, albeit with a marginal increase in runtime and token usage.
arXiv Detail & Related papers (2024-04-17T03:46:38Z) - Robust and Scalable Model Editing for Large Language Models [75.95623066605259]
We propose EREN (Edit models by REading Notes) to improve the scalability and robustness of LLM editing.
Unlike existing techniques, it can integrate knowledge from multiple edits, and correctly respond to syntactically similar but semantically unrelated inputs.
arXiv Detail & Related papers (2024-03-26T06:57:23Z) - Leveraging Large Language Models for Automated Proof Synthesis in Rust [6.202137610101939]
Large Language Models (LLMs) have shown success in code analysis and synthesis.
We present a combination of LLMs and static analysis to synthesize invariants, assertions, and other proof structures for a Rust-based formal verification framework called Verus.
Our prototype decomposes the verification task into multiple smaller ones, iteratively queries GPT-4, and combines its output with lightweight static analysis.
arXiv Detail & Related papers (2023-11-07T05:47:47Z) - Knowledge-Augmented Language Model Verification [68.6099592486075]
Recent Language Models (LMs) have shown impressive capabilities in generating texts with the knowledge internalized in parameters.
We propose to verify the output and the knowledge of the knowledge-augmented LMs with a separate verifier.
Our results show that the proposed verifier effectively identifies retrieval and generation errors, allowing LMs to provide more factually correct outputs.
arXiv Detail & Related papers (2023-10-19T15:40:00Z) - Automatic Program Instrumentation for Automatic Verification (Extended
Technical Report) [0.0]
In deductive verification and software model checking, dealing with certain specification language constructs can be problematic.
We propose instrumentation as a unifying verification paradigm that subsumes various existing ad-hoc approaches.
We implement our approach in the MonoCera tool, which is tailored to the verification of programs with aggregation.
arXiv Detail & Related papers (2023-05-26T14:55:35Z) - Self-Checker: Plug-and-Play Modules for Fact-Checking with Large Language Models [75.75038268227554]
Self-Checker is a framework comprising a set of plug-and-play modules that facilitate fact-checking.
This framework provides a fast and efficient way to construct fact-checking systems in low-resource environments.
arXiv Detail & Related papers (2023-05-24T01:46:07Z) - Label Words are Anchors: An Information Flow Perspective for
Understanding In-Context Learning [77.7070536959126]
In-context learning (ICL) emerges as a promising capability of large language models (LLMs)
In this paper, we investigate the working mechanism of ICL through an information flow lens.
We introduce an anchor re-weighting method to improve ICL performance, a demonstration compression technique to expedite inference, and an analysis framework for diagnosing ICL errors in GPT2-XL.
arXiv Detail & Related papers (2023-05-23T15:26:20Z) - Explaining Emergent In-Context Learning as Kernel Regression [61.57151500616111]
Large language models (LLMs) have initiated a paradigm shift in transfer learning.
In this paper, we investigate the reason why a transformer-based language model can accomplish in-context learning after pre-training.
We find that during ICL, the attention and hidden features in LLMs match the behaviors of a kernel regression.
arXiv Detail & Related papers (2023-05-22T06:45:02Z) - Augmented Language Models: a Survey [55.965967655575454]
This survey reviews works in which language models (LMs) are augmented with reasoning skills and the ability to use tools.
We refer to them as Augmented Language Models (ALMs)
The missing token objective allows ALMs to learn to reason, use tools, and even act, while still performing standard natural language tasks.
arXiv Detail & Related papers (2023-02-15T18:25:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.