Related papers: Advances of Proof Scores in CafeOBJ

Related papers

Specification-Guided Repair of Arithmetic Errors in Dafny Programs using LLMs [84.30534714651093]
We present an innovative APR tool for Dafny, a verification-aware programming language.<n>We localize faults through a series of steps, which include using Hoare Logic to determine the state of each statement within the program.<n>We evaluate our approach using DafnyBench, a benchmark of real-world Dafny programs.
arXiv Detail & Related papers (2025-07-04T15:36:12Z)
VERINA: Benchmarking Verifiable Code Generation [47.9771074559674]
Large language models (LLMs) are increasingly integrated in software development.<n>Verifiable code generation offers a promising path to address this limitation.<n>Current benchmarks often lack support for end-to-end verifiable code generation.
arXiv Detail & Related papers (2025-05-29T06:12:52Z)
WritingBench: A Comprehensive Benchmark for Generative Writing [87.48445972563631]
We present WritingBench, a benchmark designed to evaluate large language models (LLMs) across 6 core writing domains and 100, encompassing creative, persuasive, informative, and technical writing. We propose a query-dependent evaluation framework that empowers LLMs to dynamically generate instance-specific assessment criteria. This framework is complemented by a fine-tuned critic model for criteria-aware scoring, enabling evaluations in style, format and length.
arXiv Detail & Related papers (2025-03-07T08:56:20Z)
FormalSpecCpp: A Dataset of C++ Formal Specifications created using LLMs [0.3277163122167433]
FormalSpecCpp is a dataset designed to fill the gap in standardized benchmarks for verifying formal specifications in C++ programs. It provides a structured benchmark for evaluating specification inference tools and testing theaccuracy of generated specifications.
arXiv Detail & Related papers (2025-02-21T05:20:04Z)
Evaluating the Ability of Large Language Models to Generate Verifiable Specifications in VeriFast [5.019791860882564]
This paper explores OpenAI's GPT-4o model's effectiveness in generating specifications on C programs that are verifiable with VeriFast. Our results indicate that the specifications generated by GPT-4o preserve functional behavior, but struggle to be verifiable.
arXiv Detail & Related papers (2024-11-04T17:44:11Z)
KaPQA: Knowledge-Augmented Product Question-Answering [59.096607961704656]
We introduce two product question-answering (QA) datasets focused on Adobe Acrobat and Photoshop products. We also propose a novel knowledge-driven RAG-QA framework to enhance the performance of the models in the product QA task.
arXiv Detail & Related papers (2024-07-22T22:14:56Z)
A Graphics Function Standard Specification Validator [0.0]
A validation methodology is proposed and implemented for natural language software specifications of standard graphics functions. Checks are made for consistency, completeness, and lack of ambiguity in data element and function descriptions.
arXiv Detail & Related papers (2024-01-31T04:54:17Z)
Selene: Pioneering Automated Proof in Software Verification [62.09555413263788]
We introduce Selene, which is the first project-level automated proof benchmark constructed based on the real-world industrial-level operating system microkernel, seL4. Our experimental results with advanced large language models (LLMs), such as GPT-3.5-turbo and GPT-4, highlight the capabilities of LLMs in the domain of automated proof generation.
arXiv Detail & Related papers (2024-01-15T13:08:38Z)
Validation of Rigorous Requirements Specifications and Document Automation with the ITLingo RSL Language [0.0]
ITLingo initiative has introduced a requirements specification language named RSL to enhance the rigor and consistency of technical documentation. This paper reviews existing research and tools in the fields of requirements validation and document automation. We propose to extend RSL with validation of specifications based on customized checks, and on linguistic rules dynamically defined in the RSL itself.
arXiv Detail & Related papers (2023-12-17T21:39:26Z)
Klever: Verification Framework for Critical Industrial C Programs [0.0]
We present the Klever software verification framework, designed to reduce the effort of applying automatic software verification tools to large and critical industrial C programs. Existing tools do not provide widely adopted means for environment modeling, specification of requirements, verification of many versions and configurations of target programs, and expert assessment of verification results.
arXiv Detail & Related papers (2023-09-28T13:23:59Z)
Rely-guarantee Reasoning about Concurrent Memory Management: Correctness, Safety and Security [0.0]
An incorrect specification and implementation of the memory management may lead to system crashes or exploitable attacks. This article presents the first formal specification and mechanized proof of a concurrent memory management for a real-world OS.
arXiv Detail & Related papers (2023-09-17T03:41:10Z)
Validation-Driven Development [54.50263643323]
This paper introduces a validation-driven development (VDD) process that prioritizes validating requirements in formal development. The effectiveness of the VDD process is demonstrated through a case study in the aviation industry.
arXiv Detail & Related papers (2023-08-11T09:15:26Z)
Lessons from Formally Verified Deployed Software Systems (Extended version) [65.69802414600832]
This article examines a range of projects, in various application areas, that have produced formally verified systems and deployed them for actual use. It considers the technologies used, the form of verification applied, the results obtained, and the lessons that the software industry should draw regarding its ability to benefit from formal verification techniques and tools.
arXiv Detail & Related papers (2023-01-05T18:18:46Z)
ProQA: Structural Prompt-based Pre-training for Unified Question Answering [84.59636806421204]
ProQA is a unified QA paradigm that solves various tasks through a single model. It concurrently models the knowledge generalization for all QA tasks while keeping the knowledge customization for every specific QA task. ProQA consistently boosts performance on both full data fine-tuning, few-shot learning, and zero-shot testing scenarios.
arXiv Detail & Related papers (2022-05-09T04:59:26Z)
Auditing AI models for Verified Deployment under Semantic Specifications [65.12401653917838]
AuditAI bridges the gap between interpretable formal verification and scalability. We show how AuditAI allows us to obtain controlled variations for verification and certified training while addressing the limitations of verifying using only pixel-space perturbations.
arXiv Detail & Related papers (2021-09-25T22:53:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.