Related papers: Toward Practical Deductive Verification: Insights from a Qualitative Survey in Industry and Academia

Toward Practical Deductive Verification: Insights from a Qualitative Survey in Industry and Academia

URL: http://arxiv.org/abs/2510.20514v1
Date: Thu, 23 Oct 2025 12:59:14 GMT
Title: Toward Practical Deductive Verification: Insights from a Qualitative Survey in Industry and Academia
Authors: Lea Salome Brugger, Xavier Denis, Peter Müller,
Abstract summary: Deductive verification is an effective method to ensure that a given system exposes the intended behavior.<n>In spite of its proven usefulness and feasibility in selected projects, deductive verification is still not a mainstream technique.<n>This study investigates the factors enabling successful applications of deductive verification and the underlying issues preventing broader adoption.
Score: 2.6359922261359716
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deductive verification is an effective method to ensure that a given system exposes the intended behavior. In spite of its proven usefulness and feasibility in selected projects, deductive verification is still not a mainstream technique. To pave the way to widespread use, we present a study investigating the factors enabling successful applications of deductive verification and the underlying issues preventing broader adoption. We conducted semi-structured interviews with 30 practitioners of verification from both industry and academia and systematically analyzed the collected data employing a thematic analysis approach. Beside empirically confirming familiar challenges, e.g., the high level of expertise needed for conducting formal proofs, our data reveal several underexplored obstacles, such as proof maintenance, insufficient control over automation, and usability concerns. We further use the results from our data analysis to extract enablers and barriers for deductive verification and formulate concrete recommendations for practitioners, tool builders, and researchers, including principles for usability, automation, and integration with existing workflows.

Related papers

Leveraging LLM Parametric Knowledge for Fact Checking without Retrieval [60.25608870901428]
Trustworthiness is a core research challenge for agentic AI systems built on Large Language Models (LLMs)<n>We propose the task of fact-checking without retrieval, focusing on the verification of arbitrary natural language claims, independent of their source robustness.
arXiv Detail & Related papers (2026-03-05T18:42:51Z)
An Investigation on How AI-Generated Responses Affect SoftwareEngineering Surveys [3.183470571353323]
This study explores how large language models (LLMs) are being misused in software engineering surveys.<n>We analyzed data from two survey deployments conducted in 2025 through the Prolific platform.<n>We identify data authenticity as an emerging dimension of validity in software engineering surveys.
arXiv Detail & Related papers (2025-12-19T11:17:05Z)
Measuring what Matters: Construct Validity in Large Language Model Benchmarks [103.53142193393931]
evaluating large language models (LLMs) is crucial for both assessing their capabilities and identifying safety or robustness issues prior to deployment.<n>We conduct a systematic review of 445 benchmarks from leading conferences in natural language processing and machine learning.<n>We find patterns related to the measured phenomena, tasks, and scoring metrics which undermine the validity of the resulting claims.
arXiv Detail & Related papers (2025-11-03T17:39:40Z)
Demystifying deep search: a holistic evaluation with hint-free multi-hop questions and factorised metrics [89.1999907891494]
We present WebDetective, a benchmark of hint-free multi-hop questions paired with a controlled Wikipedia sandbox.<n>Our evaluation of 25 state-of-the-art models reveals systematic weaknesses across all architectures.<n>We develop an agentic workflow, EvidenceLoop, that explicitly targets the challenges our benchmark identifies.
arXiv Detail & Related papers (2025-10-01T07:59:03Z)
AI-Driven Tools in Modern Software Quality Assurance: An Assessment of Benefits, Challenges, and Future Directions [0.0]
The research aims to assess the benefits, challenges, and prospects of integrating modern AI-oriented tools into quality assurance processes.<n>The research demonstrates AI's transformative potential for QA but highlights the importance of a strategic approach to implementing these technologies.
arXiv Detail & Related papers (2025-06-19T20:22:47Z)
Towards Reliable Forgetting: A Survey on Machine Unlearning Verification [26.88376128769619]
This paper presents the first structured survey of machine unlearning verification methods.<n>We propose a taxonomy that organizes current techniques into two principal categories -- behavioral verification and parametric verification.<n>We examine their underlying assumptions, strengths, and limitations, and identify potential vulnerabilities in practical deployment.
arXiv Detail & Related papers (2025-06-18T03:33:59Z)
Re-evaluation of Logical Specification in Behavioural Verification [0.0]
This study empirically validates automated logical specification methods for behavioural models.<n>We identify performance irregularities that suggest the need for adaptive performance irregularities in automated reasoning.<n>Addressing these inefficiencies through self-optimising solvers could enhance the stability of automated reasoning.
arXiv Detail & Related papers (2025-05-23T14:46:39Z)
Measurement to Meaning: A Validity-Centered Framework for AI Evaluation [12.55408229639344]
We provide a structured approach for reasoning about the types of evaluative claims that can be made given the available evidence.<n>Our framework is well-suited for the contemporary paradigm in machine learning.
arXiv Detail & Related papers (2025-05-13T20:36:22Z)
Advancing Embodied Agent Security: From Safety Benchmarks to Input Moderation [52.83870601473094]
Embodied agents exhibit immense potential across a multitude of domains.<n>Existing research predominantly concentrates on the security of general large language models.<n>This paper introduces a novel input moderation framework, meticulously designed to safeguard embodied agents.
arXiv Detail & Related papers (2025-04-22T08:34:35Z)
Causality can systematically address the monsters under the bench(marks) [64.36592889550431]
Benchmarks are plagued by various biases, artifacts, or leakage.<n>Models may behave unreliably due to poorly explored failure modes.<n> causality offers an ideal framework to systematically address these challenges.
arXiv Detail & Related papers (2025-02-07T17:01:37Z)
Understanding metric-related pitfalls in image analysis validation [59.15220116166561]
This work provides the first comprehensive common point of access to information on pitfalls related to validation metrics in image analysis. Focusing on biomedical image analysis but with the potential of transfer to other fields, the addressed pitfalls generalize across application domains and are categorized according to a newly created, domain-agnostic taxonomy.
arXiv Detail & Related papers (2023-02-03T14:57:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.