Veri-Sure: A Contract-Aware Multi-Agent Framework with Temporal Tracing and Formal Verification for Correct RTL Code Generation
- URL: http://arxiv.org/abs/2601.19747v1
- Date: Tue, 27 Jan 2026 16:10:23 GMT
- Title: Veri-Sure: A Contract-Aware Multi-Agent Framework with Temporal Tracing and Formal Verification for Correct RTL Code Generation
- Authors: Jiale Liu, Taiyu Zhou, Tianqi Jiang,
- Abstract summary: silicon-grade correctness remains bottlenecked by: (i) limited test coverage and reliability of simulation-centric evaluation, (ii) regressions and repair hallucinations, and (iii) semantic drift as intent is reinterpreted across agent handoffs.<n>We propose Veri-Sure, a multi-agent framework that establishes a design contract to align agents' intent and uses a patching mechanism guided by static dependency slicing to perform precise, localized repairs.
- Score: 4.723302382132762
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the rapidly evolving field of Electronic Design Automation (EDA), the deployment of Large Language Models (LLMs) for Register-Transfer Level (RTL) design has emerged as a promising direction. However, silicon-grade correctness remains bottlenecked by: (i) limited test coverage and reliability of simulation-centric evaluation, (ii) regressions and repair hallucinations introduced by iterative debugging, and (iii) semantic drift as intent is reinterpreted across agent handoffs. In this work, we propose Veri-Sure, a multi-agent framework that establishes a design contract to align agents' intent and uses a patching mechanism guided by static dependency slicing to perform precise, localized repairs. By integrating a multi-branch verification pipeline that combines trace-driven temporal analysis with formal verification consisting of assertion-based checking and boolean equivalence proofs, Veri-Sure enables functional correctness beyond pure simulations. We also introduce VerilogEval-v2-EXT, extending the original benchmark with 53 more industrial-grade design tasks and stratified difficulty levels, and show that Veri-Sure achieves state-of-the-art verified-correct RTL code generation performance, surpassing standalone LLMs and prior agentic systems.
Related papers
- Execution-State-Aware LLM Reasoning for Automated Proof-of-Vulnerability Generation [36.950993500170014]
We present DrillAgent, an agentic framework that reformulates PoV generation as an iterative hypothesis-verification-refinement process.<n>We evaluate DrillAgent on SEC-bench, a large-scale benchmark of real-world C/C++ vulnerabilities.
arXiv Detail & Related papers (2026-02-14T03:17:27Z) - ComAgent: Multi-LLM based Agentic AI Empowered Intelligent Wireless Networks [62.031889234230725]
6G networks rely on complex cross-layer optimization.<n> manually translating high-level intents into mathematical formulations remains a bottleneck.<n>We present ComAgent, a multi-LLM agentic AI framework.
arXiv Detail & Related papers (2026-01-27T13:43:59Z) - From Completion to Editing: Unlocking Context-Aware Code Infilling via Search-and-Replace Instruction Tuning [81.97788535387286]
We propose a framework that internalizes the agentic verification-and-editing mechanism into a unified, single-pass inference process.<n>With minimal data, SRI-Coder enables Chat models to surpass the completion performance of their Base counterparts.<n>Unlike FIM-style tuning, SRI preserves general coding competencies and maintains inference latency comparable to standard FIM.
arXiv Detail & Related papers (2026-01-19T20:33:53Z) - Refinement Provenance Inference: Detecting LLM-Refined Training Prompts from Model Behavior [58.751981587234916]
This paper formalizes the Refinement Provenance Inference (RPI) audit task as Refinement Provenance Inference (RPI)<n>We propose RePro, a logit-based framework that fuses teacher-forced likelihood features with logit-ranking signals.<n>During training, RePro learns a transferable representation via shadow fine-tuning, and uses a lightweight linear head to infer provenance on unseen victims without training-data access.
arXiv Detail & Related papers (2026-01-05T10:16:41Z) - Accelerate Speculative Decoding with Sparse Computation in Verification [49.74839681322316]
Speculative decoding accelerates autoregressive language model inference by verifying multiple draft tokens in parallel.<n>Existing sparsification methods are designed primarily for standard token-by-token autoregressive decoding.<n>We propose a sparse verification framework that jointly sparsifies attention, FFN, and MoE components during the verification stage to reduce the dominant computation cost.
arXiv Detail & Related papers (2025-12-26T07:53:41Z) - Formal that "Floats" High: Formal Verification of Floating Point Arithmetic [0.0]
This paper presents a scalable methodology for verifying floating-point arithmetic using direct RTL-to-RTL model checking against a golden reference model.<n>The methodology is extended with agentic AI-based formal property generation, integrating large language model (LLM)-driven automation with Human-in-the-Loop (HITL) refinement.<n>Results show that direct RTL-to-RTL model checking achieves higher coverage efficiency and requires fewer assertions than standalone verification.
arXiv Detail & Related papers (2025-12-07T14:03:44Z) - PEFA-AI: Advancing Open-source LLMs for RTL generation using Progressive Error Feedback Agentic-AI [5.455262834289454]
We present an agentic flow consisting of multiple agents that collaboratively complete the task of Register Transfer Level (RTL) generation without human intervention.<n>A key feature of the proposed flow is the progressive error feedback system of agents (PEFA), a self-correcting mechanism.<n>To validate this adaptive approach to code generation, benchmarking is performed using two opensource natural language-to-RTL datasets.
arXiv Detail & Related papers (2025-11-06T00:19:47Z) - Automatic Building Code Review: A Case Study [6.530899637501737]
Building officials face labor-intensive, error-prone, and costly manual reviews of design documents as projects increase in size and complexity.<n>This study introduces a novel agent-driven framework that integrates BIM-based data extraction with automated verification.
arXiv Detail & Related papers (2025-10-03T00:30:14Z) - Tractable Asymmetric Verification for Large Language Models via Deterministic Replicability [0.6117371161379209]
The landscape of Large Language Models (LLMs) shifts rapidly towards dynamic, multi-agent systems.<n>This paper proposes a verification framework that achieves tractable asymmetric effort.<n>We show that targeted verification can be over 12 times faster than full regeneration.
arXiv Detail & Related papers (2025-09-14T03:30:06Z) - CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward [50.97588334916863]
We develop CompassVerifier, an accurate and robust lightweight verifier model for evaluation and outcome reward.<n>It demonstrates multi-domain competency spanning math, knowledge, and diverse reasoning tasks, with the capability to process various answer types.<n>We introduce VerifierBench benchmark comprising model outputs collected from multiple data sources, augmented through manual analysis of metaerror patterns to enhance CompassVerifier.
arXiv Detail & Related papers (2025-08-05T17:55:24Z) - AssertCoder: LLM-Based Assertion Generation via Multimodal Specification Extraction [32.14733357890831]
We propose AssertCoder, a novel unified framework that automatically generates high-quality SVAs.<n>AssertCoder employs a modality-sensitive preprocessing to parse heterogeneous specification formats.<n>The framework incorporates a mutation-based evaluation approach to assess assertion quality.
arXiv Detail & Related papers (2025-07-14T14:43:14Z) - Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification [52.095460362197336]
Large language models (LLMs) struggle with consistent and accurate reasoning.
LLMs are trained primarily on correct solutions, reducing their ability to detect and learn from errors.
We propose a novel collaborative method integrating Chain-of-Thought (CoT) and Program-of-Thought (PoT) solutions for verification.
arXiv Detail & Related papers (2024-10-05T05:21:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.