Re-evaluation of Logical Specification in Behavioural Verification
- URL: http://arxiv.org/abs/2505.17979v1
- Date: Fri, 23 May 2025 14:46:39 GMT
- Title: Re-evaluation of Logical Specification in Behavioural Verification
- Authors: Radoslaw Klimek, Jakub Semczyszyn,
- Abstract summary: This study empirically validates automated logical specification methods for behavioural models.<n>We identify performance irregularities that suggest the need for adaptive performance irregularities in automated reasoning.<n>Addressing these inefficiencies through self-optimising solvers could enhance the stability of automated reasoning.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This study empirically validates automated logical specification methods for behavioural models, focusing on their robustness, scalability, and reproducibility. By the systematic reproduction and extension of prior results, we confirm key trends, while identifying performance irregularities that suggest the need for adaptive heuristics in automated reasoning. Our findings highlight that theorem provers exhibit varying efficiency across problem structures, with implications for real-time verification in CI/CD pipelines and AI-driven IDEs supporting on-the-fly validation. Addressing these inefficiencies through self-optimising solvers could enhance the stability of automated reasoning, particularly in safety-critical software verification.
Related papers
- Exploring and Exploiting the Inherent Efficiency within Large Reasoning Models for Self-Guided Efficiency Enhancement [101.77467538102924]
Large reasoning models (LRMs) exhibit overthinking, which hinders efficiency and inflates inference cost.<n>We propose two lightweight methods to enhance LRM efficiency.<n>First, we introduce Efficiency Steering, a training-free activation steering technique that modulates reasoning behavior via a single direction.<n>Second, we develop Self-Rewarded Efficiency RL, a reinforcement learning framework that dynamically balances task accuracy and brevity.
arXiv Detail & Related papers (2025-06-18T17:18:12Z) - Logic Mining from Process Logs: Towards Automated Specification and Verification [0.0]
This article presents an approach for generating logical specifications from process models discovered via workflow.<n>The study examines the impact of quality data, particularly noise, on the structure and testability of generated specifications.
arXiv Detail & Related papers (2025-06-10T09:44:19Z) - Can Large Reasoning Models Self-Train? [58.953117118687096]
Scaling the performance of large language models increasingly depends on methods that reduce reliance on human supervision.<n>We propose an online self-training reinforcement learning algorithm that leverages the model's self-consistency to infer correctness signals and train without any ground-truth supervision.
arXiv Detail & Related papers (2025-05-27T17:16:00Z) - Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards [67.86091419220816]
Large Language Models (LLMs) show great promise in complex reasoning.<n>A prevalent issue is superficial self-reflection'', where models fail to robustly verify their own outputs.<n>We introduce RISE (Reinforcing Reasoning with Self-Verification), a novel online RL framework designed to tackle this.
arXiv Detail & Related papers (2025-05-19T17:59:31Z) - AlignRAG: Leveraging Critique Learning for Evidence-Sensitive Retrieval-Augmented Reasoning [61.28113271728859]
RAG has become a widely adopted paradigm for enabling knowledge-grounded large language models (LLMs)<n>Standard RAG pipelines often fail to ensure that model reasoning remains consistent with the evidence retrieved, leading to factual inconsistencies or unsupported conclusions.<n>In this work, we reinterpret RAG as Retrieval-Augmented Reasoning and identify a central but underexplored problem: textitReasoning Misalignment.
arXiv Detail & Related papers (2025-04-21T04:56:47Z) - Enhancing LLM Reliability via Explicit Knowledge Boundary Modeling [48.15636223774418]
Large language models (LLMs) frequently hallucinate due to misaligned self-awareness.<n>Existing approaches mitigate hallucinations via uncertainty estimation or query rejection.<n>We propose the Explicit Knowledge Boundary Modeling framework to integrate fast and slow reasoning systems.
arXiv Detail & Related papers (2025-03-04T03:16:02Z) - ReVISE: Learning to Refine at Test-Time via Intrinsic Self-Verification [53.80183105328448]
Refine via Intrinsic Self-Verification (ReVISE) is an efficient framework that enables LLMs to self-correct their outputs through self-verification.<n>Our experiments on various reasoning tasks demonstrate that ReVISE achieves efficient self-correction and significantly improves reasoning performance.
arXiv Detail & Related papers (2025-02-20T13:50:02Z) - Word-Level ASR Quality Estimation for Efficient Corpus Sampling and
Post-Editing through Analyzing Attentions of a Reference-Free Metric [5.592917884093537]
The potential of quality estimation (QE) metrics is introduced and evaluated as a novel tool to enhance explainable artificial intelligence (XAI) in ASR systems.
The capabilities of the NoRefER metric are explored in identifying word-level errors to aid post-editors in refining ASR hypotheses.
arXiv Detail & Related papers (2024-01-20T16:48:55Z) - Surrogate Neural Networks Local Stability for Aircraft Predictive Maintenance [1.6703148532130556]
Surrogate Neural Networks are routinely used in industry as substitutes for computationally demanding engineering simulations.
Due to their performance and time-efficiency, these surrogate models are now being developed for use in safety-critical applications.
arXiv Detail & Related papers (2024-01-11T21:04:28Z) - Investigating Robustness in Cyber-Physical Systems: Specification-Centric Analysis in the face of System Deviations [8.8690305802668]
A critical attribute of cyber-physical systems (CPS) is robustness, denoting its capacity to operate safely.
This paper proposes a novel specification-based robustness, which characterizes the effectiveness of a controller in meeting a specified system requirement.
We present an innovative two-layer simulation-based analysis framework designed to identify subtle robustness violations.
arXiv Detail & Related papers (2023-11-13T16:44:43Z) - Position: AI Evaluation Should Learn from How We Test Humans [65.36614996495983]
We argue that psychometrics, a theory originating in the 20th century for human assessment, could be a powerful solution to the challenges in today's AI evaluations.
arXiv Detail & Related papers (2023-06-18T09:54:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.