Related papers: VeriDebug: A Unified LLM for Verilog Debugging via Contrastive Embedding and Guided Correction

VeriDebug: A Unified LLM for Verilog Debugging via Contrastive Embedding and Guided Correction

URL: http://arxiv.org/abs/2504.19099v1
Date: Sun, 27 Apr 2025 04:09:48 GMT
Title: VeriDebug: A Unified LLM for Verilog Debugging via Contrastive Embedding and Guided Correction
Authors: Ning Wang, Bingkun Yao, Jie Zhou, Yuchen Hu, Xi Wang, Nan Guan, Zhe Jiang,
Abstract summary: We present Veri Debug, an approach that integrates contrastive representation and guided correction capabilities.<n>Our model achieves 64.7 accuracy in bug fixing (Acc1), a significant improvement from the existing open-source SOTAs 11.3.<n>This performance not only outperforms open-source alternatives but also exceeds larger closed-source models like GPT-3.5-turbo (36.6), offering a more accurate alternative to conventional debug methods.
Score: 36.69082579950107
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) have demonstrated remarkable potential in debugging for various programming languages. However, the application of LLMs to Verilog debugging remains insufficiently explored. Here, we present VeriDebug, an approach that integrates contrastive representation and guided correction capabilities for automated Verilog debugging. Unlike existing methods, VeriDebug employs an embedding-based technique to accurately retrieve internal information, followed by bug-fixing. VeriDebug unifies Verilog bug detection and correction through a shared parameter space. By simultaneously learning bug patterns and fixes, it streamlines debugging via contrastive embedding and guided correction. Empirical results show the efficacy of VeriDebug in enhancing Verilog debugging. Our VeriDebugLoc, Type model achieves 64.7 accuracy in bug fixing (Acc1), a significant improvement from the existing open-source SOTAs 11.3. This performance not only outperforms open-source alternatives but also exceeds larger closed-source models like GPT-3.5-turbo (36.6), offering a more accurate alternative to conventional debugging methods.

Related papers

InspectCoder: Dynamic Analysis-Enabled Self Repair through interactive LLM-Debugger Collaboration [71.18377595277018]
Large Language Models (LLMs) frequently generate buggy code with complex logic errors that are challenging to diagnose.<n>We present InspectCoder, the first agentic program repair system that empowers LLMs to actively conduct dynamic analysis via interactive debugger control.
arXiv Detail & Related papers (2025-10-21T06:26:29Z)
RepoDebug: Repository-Level Multi-Task and Multi-Language Debugging Evaluation of Large Language Models [49.83481415540291]
Large Language Models (LLMs) have exhibited significant proficiency in code debug.<n>This paper introduces Repo Debug, a multi-task and multi-language repository-level code debug dataset.<n>We conduct evaluation experiments on 10 LLMs, where Claude 3.5 Sonnect, the best-performing model, still cannot perform well in repository-level debug.
arXiv Detail & Related papers (2025-09-04T10:13:21Z)
Do AI models help produce verified bug fixes? [62.985237003585674]
Large Language Models are used to produce corrections to software bugs.<n>This paper investigates how programmers use Large Language Models to complement their own skills.<n>The results are a first step towards a proper role for AI and LLMs in providing guaranteed-correct fixes to program bugs.
arXiv Detail & Related papers (2025-07-21T17:30:16Z)
Learning to Solve and Verify: A Self-Play Framework for Code and Test Generation [69.62857948698436]
Recent advances in large language models (LLMs) have improved their performance on coding benchmarks. However, improvement is plateauing due to the exhaustion of readily available high-quality data. We propose Sol-Ver, a self-play solver-verifier framework that jointly improves a single model's code and test generation capacity.
arXiv Detail & Related papers (2025-02-20T18:32:19Z)
From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging [5.910272203315325]
We introduce Multi-Granularity Debugger (MG Debugger), a hierarchical code debugger by isolating, identifying, and resolving bugs at various levels of granularity. MG Debugger decomposes problematic code into a hierarchical tree structure of subfunctions, with each level representing a particular granularity of error. It achieves an 18.9% improvement in accuracy over seed generations in HumanEval and a 97.6% repair success rate in HumanEvalFix.
arXiv Detail & Related papers (2024-10-02T03:57:21Z)
Location is Key: Leveraging Large Language Model for Functional Bug Localization in Verilog [28.954480748183943]
This paper presents Location-is-Key, an opensource LLM solution to locate functional errors in Verilog snippets. The pass@1 localization accuracy of 93.3% on our test dataset based on RTLLM, surpassed GPT-4's 77.9% and comparable to Claude-3.5's 90.8%.
arXiv Detail & Related papers (2024-09-23T16:38:53Z)
COAST: Enhancing the Code Debugging Ability of LLMs through Communicative Agent Based Data Synthesis [29.667170755786508]
We introduce EVAL, a benchmark for evaluating the abilities of Large Language Models. We propose the COmmunicative Agent-based data SynThesis framework, which employs a multi-agent system to generate high-quality training data. Results demonstrate that COAST-generated data outperform human-curated and GPT-4-generated data.
arXiv Detail & Related papers (2024-08-09T11:35:44Z)
VDebugger: Harnessing Execution Feedback for Debugging Visual Programs [103.61860743476933]
We introduce V Debugger, a critic-refiner framework trained to localize and debug visual programs by tracking execution step by step. V Debugger identifies and corrects program errors leveraging detailed execution feedback, improving interpretability and accuracy. Evaluations on six datasets demonstrate V Debugger's effectiveness, showing performance improvements of up to 3.2% in downstream task accuracy.
arXiv Detail & Related papers (2024-06-19T11:09:16Z)
Leveraging Print Debugging to Improve Code Generation in Large Language Models [63.63160583432348]
Large language models (LLMs) have made significant progress in code generation tasks. But their performance in tackling programming problems with complex data structures and algorithms remains suboptimal. We propose an in-context learning approach that guides LLMs to debug by using a "print debug" method.
arXiv Detail & Related papers (2024-01-10T18:37:59Z)
DebugBench: Evaluating Debugging Capability of Large Language Models [80.73121177868357]
DebugBench is a benchmark for Large Language Models (LLMs) It covers four major bug categories and 18 minor types in C++, Java, and Python. We evaluate two commercial and four open-source models in a zero-shot scenario.
arXiv Detail & Related papers (2024-01-09T15:46:38Z)
Teaching Large Language Models to Self-Debug [62.424077000154945]
Large language models (LLMs) have achieved impressive performance on code generation. We propose Self- Debugging, which teaches a large language model to debug its predicted program via few-shot demonstrations.
arXiv Detail & Related papers (2023-04-11T10:43:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.