Related papers: FVDebug: An LLM-Driven Debugging Assistant for Automated Root Cause Analysis of Formal Verification Failures

FVDebug: An LLM-Driven Debugging Assistant for Automated Root Cause Analysis of Formal Verification Failures

URL: http://arxiv.org/abs/2510.15906v1
Date: Tue, 16 Sep 2025 20:22:10 GMT
Title: FVDebug: An LLM-Driven Debugging Assistant for Automated Root Cause Analysis of Formal Verification Failures
Authors: Yunsheng Bai, Ghaith Bany Hamad, Chia-Tung Ho, Syed Suhaib, Haoxing Ren,
Abstract summary: We present FV Debug, an intelligent system that transforms failure traces into actionable insights.<n>Our approach features a novel pipeline: (1) Causal Graph Synthesis that structures failure traces into directed acyclic graphs, (2) Graph Scanner using batched Large Language Model (LLM) analysis with for-and-against prompting to identify suspicious nodes, and (3) Insight Rover leveraging agentic narrative exploration to generate high-level causal explanations.
Score: 8.530369312832084
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Debugging formal verification (FV) failures represents one of the most time-consuming bottlenecks in modern hardware design workflows. When properties fail, engineers must manually trace through complex counter-examples spanning multiple cycles, analyze waveforms, and cross-reference design specifications to identify root causes - a process that can consume hours or days per bug. Existing solutions are largely limited to manual waveform viewers or simple automated tools that cannot reason about the complex interplay between design intent and implementation logic. We present FVDebug, an intelligent system that automates root-cause analysis by combining multiple data sources - waveforms, RTL code, design specifications - to transform failure traces into actionable insights. Our approach features a novel pipeline: (1) Causal Graph Synthesis that structures failure traces into directed acyclic graphs, (2) Graph Scanner using batched Large Language Model (LLM) analysis with for-and-against prompting to identify suspicious nodes, and (3) Insight Rover leveraging agentic narrative exploration to generate high-level causal explanations. FVDebug further provides concrete RTL fixes through its Fix Generator. Evaluated on open benchmarks, FVDebug attains high hypothesis quality and strong Pass@k fix rates. We further report results on two proprietary, production-scale FV counterexamples. These results demonstrate FVDebug's applicability from academic benchmarks to industrial designs.

Related papers

HELP: HyperNode Expansion and Logical Path-Guided Evidence Localization for Accurate and Efficient GraphRAG [53.30561659838455]
Large Language Models (LLMs) often struggle with inherent knowledge boundaries and hallucinations.<n>Retrieval-Augmented Generation (RAG) frequently overlooks structural interdependencies essential for multi-hop reasoning.<n>Help achieves competitive performance across multiple simple and multi-hop QA benchmarks and up to a 28.8$times$ speedup over leading Graph-based RAG baselines.
arXiv Detail & Related papers (2026-02-24T14:05:29Z)
Beyond Pixels: Vector-to-Graph Transformation for Reliable Schematic Auditing [34.54168175788343]
We propose a Vector-to-Graph (V2G) pipeline that converts CAD diagrams into property graphs where nodes represent components and edges encode connectivity.<n>On a diagnostic benchmark of electrical compliance checks, V2G yields large accuracy gains across all error categories, while leading MLLMs remain near chance level.
arXiv Detail & Related papers (2026-02-12T07:50:49Z)
Veri-Sure: A Contract-Aware Multi-Agent Framework with Temporal Tracing and Formal Verification for Correct RTL Code Generation [4.723302382132762]
silicon-grade correctness remains bottlenecked by: (i) limited test coverage and reliability of simulation-centric evaluation, (ii) regressions and repair hallucinations, and (iii) semantic drift as intent is reinterpreted across agent handoffs.<n>We propose Veri-Sure, a multi-agent framework that establishes a design contract to align agents' intent and uses a patching mechanism guided by static dependency slicing to perform precise, localized repairs.
arXiv Detail & Related papers (2026-01-27T16:10:23Z)
Guardian: Detecting Robotic Planning and Execution Errors with Vision-Language Models [53.20969621498248]
We propose an automatic robot failure synthesis approach that procedurally perturbs successful trajectories to generate diverse planning and execution failures.<n>We construct three new failure detection benchmarks: RLBench-Fail, BridgeDataV2-Fail, and UR5-Fail.<n>We then train Guardian, a VLM with multi-view images for detailed failure reasoning and detection.
arXiv Detail & Related papers (2025-12-01T17:57:27Z)
Med-CRAFT: Automated Construction of Interpretable and Multi-Hop Video Workloads via Knowledge Graph Traversal [13.216513001286812]
textbfPipelineName is a novel neuro-symbolic data engineering framework.<n> Med-CRAFT extracts structured visual primitives from raw video streams and instantiates them into a dynamic Spatiotemporal Knowledge Graph.<n>We instantiate this pipeline to produce M3-Med-Auto, a large-scale medical video reasoning benchmark.
arXiv Detail & Related papers (2025-11-30T19:24:10Z)
Fault2Flow: An AlphaEvolve-Optimized Human-in-the-Loop Multi-Agent System for Fault-to-Workflow Automation [16.030246172690237]
Power grid technicians must manually extract reasoning logic from dense regulations.<n>No existing framework integrates these two disparate knowledge sources into a single, verified, and executable workflow.<n>Fault2Flow systematically extracts regulatory logic into PASTA-formatted fault trees.<n>It integrates expert knowledge via a human-in-the-loop interface for verification.
arXiv Detail & Related papers (2025-11-17T03:07:40Z)
InspectCoder: Dynamic Analysis-Enabled Self Repair through interactive LLM-Debugger Collaboration [71.18377595277018]
Large Language Models (LLMs) frequently generate buggy code with complex logic errors that are challenging to diagnose.<n>We present InspectCoder, the first agentic program repair system that empowers LLMs to actively conduct dynamic analysis via interactive debugger control.
arXiv Detail & Related papers (2025-10-21T06:26:29Z)
Metacognitive Self-Correction for Multi-Agent System via Prototype-Guided Next-Execution Reconstruction [58.51530390018909]
Large Language Model based multi-agent systems excel at collaborative problem solving but remain brittle to cascading errors.<n>We present MASC, a metacognitive framework that endows MAS with real-time, unsupervised, step-level error detection and self-correction.
arXiv Detail & Related papers (2025-10-16T05:35:37Z)
RationAnomaly: Log Anomaly Detection with Rationality via Chain-of-Thought and Reinforcement Learning [27.235259453535537]
RationAnomaly is a novel framework that enhances log anomaly detection by synergizing Chain-of-Thought fine-tuning with reinforcement learning.<n>We have released the corresponding resources, including code and datasets.
arXiv Detail & Related papers (2025-09-18T07:35:58Z)
LAD-Reasoner: Tiny Multimodal Models are Good Reasoners for Logical Anomaly Detection [27.45348890285863]
We introduce Reasoning Logical Anomaly Detection (RLAD), which extends traditional anomaly detection by incorporating logical reasoning.<n>We propose a new framework, LAD-Reasoner, a customized tiny multimodal language model built on Qwen2.5-VL 3B.<n> Experiments on the MVTec LOCO AD dataset show that LAD-Reasoner, though significantly smaller, matches the performance of Qwen2.5-VL-72B in accuracy and F1 score.
arXiv Detail & Related papers (2025-04-17T08:41:23Z)
FaultExplainer: Leveraging Large Language Models for Interpretable Fault Detection and Diagnosis [7.161558367924948]
This paper presents FaultExplainer, an interactive tool designed to improve fault detection, diagnosis, and explanation in the Tennessee Eastman Process (TEP)<n>FaultExplainer integrates real-time sensor data visualization, Principal Component Analysis (PCA)-based fault detection, and identification of top contributing variables within an interactive user interface powered by large language models (LLMs)<n>We evaluate the LLMs' reasoning capabilities in two scenarios: one where historical root causes are provided, and one where they are not to mimic the challenge of previously unseen faults.
arXiv Detail & Related papers (2024-12-19T03:35:06Z)
Counterfactual Intervention Feature Transfer for Visible-Infrared Person Re-identification [69.45543438974963]
We find graph-based methods in the visible-infrared person re-identification task (VI-ReID) suffer from bad generalization because of two issues. The well-trained input features weaken the learning of graph topology, making it not generalized enough during the inference process. We propose a Counterfactual Intervention Feature Transfer (CIFT) method to tackle these problems.
arXiv Detail & Related papers (2022-08-01T16:15:31Z)
Self-Supervised Log Parsing [59.04636530383049]
Large-scale software systems generate massive volumes of semi-structured log records. Existing approaches rely on log-specifics or manual rule extraction. We propose NuLog that utilizes a self-supervised learning model and formulates the parsing task as masked language modeling.
arXiv Detail & Related papers (2020-03-17T19:25:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.