Validating Traces of Distributed Programs Against TLA+ Specifications
- URL: http://arxiv.org/abs/2404.16075v2
- Date: Tue, 17 Sep 2024 16:52:15 GMT
- Title: Validating Traces of Distributed Programs Against TLA+ Specifications
- Authors: Horatiu Cirstea, Markus A. Kuppe, Benjamin Loillier, Stephan Merz,
- Abstract summary: We present a framework for relating traces of distributed programs to high-level specifications written in TLA+.
The problem is reduced to a constrained model checking problem, realized using the TLC model checker.
We have applied our approach to several distributed programs, detecting discrepancies between the specifications and the implementations in all cases.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: TLA+ is a formal language for specifying systems, including distributed algorithms, that is supported by powerful verification tools. In this work we present a framework for relating traces of distributed programs to high-level specifications written in TLA+. The problem is reduced to a constrained model checking problem, realized using the TLC model checker. Our framework consists of an API for instrumenting Java programs in order to record traces of executions, of a collection of TLA+ operators that are used for relating those traces to specifications, and of scripts for running the model checker. Crucially, traces only contain updates to specification variables rather than full values, and developers may choose to trace only certain variables. We have applied our approach to several distributed programs, detecting discrepancies between the specifications and the implementations in all cases. We discuss reasons for these discrepancies, best practices for instrumenting programs, and how to interpret the verdict produced by TLC.
Related papers
- SpecTool: A Benchmark for Characterizing Errors in Tool-Use LLMs [77.79172008184415]
SpecTool is a new benchmark to identify error patterns in LLM output on tool-use tasks.
We show that even the most prominent LLMs exhibit these error patterns in their outputs.
Researchers can use the analysis and insights from SPECTOOL to guide their error mitigation strategies.
arXiv Detail & Related papers (2024-11-20T18:56:22Z) - Evaluating the Ability of Large Language Models to Generate Verifiable Specifications in VeriFast [5.019791860882564]
Large language models (LLMs) have shown promise in a number of software engineering activities.
This paper explores the effectiveness of OpenAI's GPT models in generating specifications based on separation logic.
arXiv Detail & Related papers (2024-11-04T17:44:11Z) - DiffSpec: Differential Testing with LLMs using Natural Language Specifications and Code Artifacts [6.222103009114668]
We introduce DiffSpec, a framework for generating differential tests with large language models.
We demonstrate the efficacy of DiffSpec on two different systems, namely, eBPF runtimes and Wasm validators.
arXiv Detail & Related papers (2024-10-05T18:11:14Z) - Multi-Grained Specifications for Distributed System Model Checking and Verification [9.14614889088682]
We use TLA+ to model fine-grained behaviors of ZooKeeper and the TLC model checker to verify its correctness properties.
We show that writing multi-grained specifications is a viable practice and can cope with model-code gaps without untenable state space.
arXiv Detail & Related papers (2024-09-22T02:59:56Z) - Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Models [63.36637269634553]
We present a novel method of further improving performance by requiring models to compare multiple reasoning chains.
We find that instruction tuning on DCoT datasets boosts the performance of even smaller, and therefore more accessible, language models.
arXiv Detail & Related papers (2024-07-03T15:01:18Z) - Disperse-Then-Merge: Pushing the Limits of Instruction Tuning via Alignment Tax Reduction [75.25114727856861]
Large language models (LLMs) tend to suffer from deterioration at the latter stage ofSupervised fine-tuning process.
We introduce a simple disperse-then-merge framework to address the issue.
Our framework outperforms various sophisticated methods such as data curation and training regularization on a series of standard knowledge and reasoning benchmarks.
arXiv Detail & Related papers (2024-05-22T08:18:19Z) - Requirements Traceability: Recovering and Visualizing Traceability Links
Between Requirements and Source Code of Object-oriented Software Systems [0.0]
Requirement-to-Code Traceability Links (RtC-TLs) shape the relations between requirement and source code artifacts.
This paper introduces YamenTrace, an automatic approach and implementation to recover and visualize RtC-TLs.
arXiv Detail & Related papers (2023-07-09T11:01:16Z) - Fact-Checking Complex Claims with Program-Guided Reasoning [99.7212240712869]
Program-Guided Fact-Checking (ProgramFC) is a novel fact-checking model that decomposes complex claims into simpler sub-tasks.
We first leverage the in-context learning ability of large language models to generate reasoning programs.
We execute the program by delegating each sub-task to the corresponding sub-task handler.
arXiv Detail & Related papers (2023-05-22T06:11:15Z) - Fault-Aware Neural Code Rankers [64.41888054066861]
We propose fault-aware neural code rankers that can predict the correctness of a sampled program without executing it.
Our fault-aware rankers can significantly increase the pass@1 accuracy of various code generation models.
arXiv Detail & Related papers (2022-06-04T22:01:05Z) - Self-Supervised Log Parsing [59.04636530383049]
Large-scale software systems generate massive volumes of semi-structured log records.
Existing approaches rely on log-specifics or manual rule extraction.
We propose NuLog that utilizes a self-supervised learning model and formulates the parsing task as masked language modeling.
arXiv Detail & Related papers (2020-03-17T19:25:25Z) - Learning to Encode and Classify Test Executions [14.67675979776677]
The goal in this paper is to solve the test oracle problem in a way that is general, scalable and accurate.
We label a small fraction of the execution traces with their verdict of pass or fail.
We use labelled traces to train a neural network (NN) model to learn to distinguish runtime patterns for passing versus failing executions.
arXiv Detail & Related papers (2020-01-08T10:53:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.