Validating Traces of Distributed Programs Against TLA+ Specifications
- URL: http://arxiv.org/abs/2404.16075v2
- Date: Tue, 17 Sep 2024 16:52:15 GMT
- Title: Validating Traces of Distributed Programs Against TLA+ Specifications
- Authors: Horatiu Cirstea, Markus A. Kuppe, Benjamin Loillier, Stephan Merz,
- Abstract summary: We present a framework for relating traces of distributed programs to high-level specifications written in TLA+.
The problem is reduced to a constrained model checking problem, realized using the TLC model checker.
We have applied our approach to several distributed programs, detecting discrepancies between the specifications and the implementations in all cases.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: TLA+ is a formal language for specifying systems, including distributed algorithms, that is supported by powerful verification tools. In this work we present a framework for relating traces of distributed programs to high-level specifications written in TLA+. The problem is reduced to a constrained model checking problem, realized using the TLC model checker. Our framework consists of an API for instrumenting Java programs in order to record traces of executions, of a collection of TLA+ operators that are used for relating those traces to specifications, and of scripts for running the model checker. Crucially, traces only contain updates to specification variables rather than full values, and developers may choose to trace only certain variables. We have applied our approach to several distributed programs, detecting discrepancies between the specifications and the implementations in all cases. We discuss reasons for these discrepancies, best practices for instrumenting programs, and how to interpret the verdict produced by TLC.
Related papers
- CLOVER: A Test Case Generation Benchmark with Coverage, Long-Context, and Verification [71.34070740261072]
This paper presents a benchmark, CLOVER, to evaluate models' capabilities in generating and completing test cases.
The benchmark is containerized for code execution across tasks, and we will release the code, data, and construction methodologies.
arXiv Detail & Related papers (2025-02-12T21:42:56Z) - TGOSPA Metric Parameters Selection and Evaluation for Visual Multi-object Tracking [4.998475411100799]
We use the trajectory generalized optimal sub-pattern assignment (TGOSPA) metric to evaluate multi-object tracking performance.
It accounts for localization errors, the number of missed and false objects, and the number of track switches.
By exploring the TGOSPA parameter selection, we enable users to compare, comprehend, and optimize the performance of algorithms tailored for specific tasks.
arXiv Detail & Related papers (2024-12-11T11:57:05Z) - Evaluating the Ability of Large Language Models to Generate Verifiable Specifications in VeriFast [5.019791860882564]
This paper explores OpenAI's GPT-4o model's effectiveness in generating specifications on C programs that are verifiable with VeriFast.
Our results indicate that the specifications generated by GPT-4o preserve functional behavior, but struggle to be verifiable.
arXiv Detail & Related papers (2024-11-04T17:44:11Z) - Multi-Grained Specifications for Distributed System Model Checking and Verification [9.14614889088682]
We use TLA+ to model fine-grained behaviors of ZooKeeper and the TLC model checker to verify its correctness properties.
We show that writing multi-grained specifications is a viable practice and can cope with model-code gaps without untenable state space.
arXiv Detail & Related papers (2024-09-22T02:59:56Z) - Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Models [63.36637269634553]
We present a novel method of further improving performance by requiring models to compare multiple reasoning chains.
We find that instruction tuning on DCoT datasets boosts the performance of even smaller, and therefore more accessible, language models.
arXiv Detail & Related papers (2024-07-03T15:01:18Z) - TESTEVAL: Benchmarking Large Language Models for Test Case Generation [15.343859279282848]
We propose TESTEVAL, a novel benchmark for test case generation with large language models (LLMs)
We collect 210 Python programs from an online programming platform, LeetCode, and design three different tasks: overall coverage, targeted line/branch coverage, and targeted path coverage.
We find that generating test cases to cover specific program lines/branches/paths is still challenging for current LLMs.
arXiv Detail & Related papers (2024-06-06T22:07:50Z) - Instruction Tuning for Large Language Models: A Survey [52.86322823501338]
We make a systematic review of the literature, including the general methodology of supervised fine-tuning (SFT)
We also review the potential pitfalls of SFT along with criticism against it, along with efforts pointing out current deficiencies of existing strategies.
arXiv Detail & Related papers (2023-08-21T15:35:16Z) - Requirements Traceability: Recovering and Visualizing Traceability Links
Between Requirements and Source Code of Object-oriented Software Systems [0.0]
Requirement-to-Code Traceability Links (RtC-TLs) shape the relations between requirement and source code artifacts.
This paper introduces YamenTrace, an automatic approach and implementation to recover and visualize RtC-TLs.
arXiv Detail & Related papers (2023-07-09T11:01:16Z) - Fact-Checking Complex Claims with Program-Guided Reasoning [99.7212240712869]
Program-Guided Fact-Checking (ProgramFC) is a novel fact-checking model that decomposes complex claims into simpler sub-tasks.
We first leverage the in-context learning ability of large language models to generate reasoning programs.
We execute the program by delegating each sub-task to the corresponding sub-task handler.
arXiv Detail & Related papers (2023-05-22T06:11:15Z) - Fault-Aware Neural Code Rankers [64.41888054066861]
We propose fault-aware neural code rankers that can predict the correctness of a sampled program without executing it.
Our fault-aware rankers can significantly increase the pass@1 accuracy of various code generation models.
arXiv Detail & Related papers (2022-06-04T22:01:05Z) - Self-Supervised Log Parsing [59.04636530383049]
Large-scale software systems generate massive volumes of semi-structured log records.
Existing approaches rely on log-specifics or manual rule extraction.
We propose NuLog that utilizes a self-supervised learning model and formulates the parsing task as masked language modeling.
arXiv Detail & Related papers (2020-03-17T19:25:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.