Related papers: LAUDE: LLM-Assisted Unit Test Generation and Debugging of Hardware DEsigns

LAUDE: LLM-Assisted Unit Test Generation and Debugging of Hardware DEsigns

URL: http://arxiv.org/abs/2601.08856v1
Date: Tue, 06 Jan 2026 04:00:07 GMT
Title: LAUDE: LLM-Assisted Unit Test Generation and Debugging of Hardware DEsigns
Authors: Deeksha Nandal, Riccardo Revalor, Soham Dan, Debjit Pal,
Abstract summary: Unit tests are critical in the hardware design lifecycle to ensure that component design modules are functionally correct and conform to the specification before they are integrated at the system level.<n>We introduce LAUDE, a unified unit-test generation and debug framework for hardware designs that cross-pollinates the semantic understanding of the design source code with the Chain-of-Thought (CoT) reasoning capabilities of foundational Large-Language Models (LLMs)<n>We apply LAUDE with closed- and open-source LLMs to a large corpus of buggy hardware design codes derived from the VerilogEval dataset, where generated unit tests detected bugs in
Score: 9.542805275381566
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Unit tests are critical in the hardware design lifecycle to ensure that component design modules are functionally correct and conform to the specification before they are integrated at the system level. Thus developing unit tests targeting various design features requires deep understanding of the design functionality and creativity. When one or more unit tests expose a design failure, the debugging engineer needs to diagnose, localize, and debug the failure to ensure design correctness, which is often a painstaking and intense process. In this work, we introduce LAUDE, a unified unit-test generation and debugging framework for hardware designs that cross-pollinates the semantic understanding of the design source code with the Chain-of-Thought (CoT) reasoning capabilities of foundational Large-Language Models (LLMs). LAUDE integrates prompt engineering and design execution information to enhance its unit test generation accuracy and code debuggability. We apply LAUDE with closed- and open-source LLMs to a large corpus of buggy hardware design codes derived from the VerilogEval dataset, where generated unit tests detected bugs in up to 100% and 93% of combinational and sequential designs and debugged up to 93% and 84% of combinational and sequential designs, respectively.

Related papers

Hardware Co-Design Scaling Laws via Roofline Modelling for On-Device LLMs [49.99513618431772]
We propose a hardware co-design law that captures model accuracy and inference performance.<n>We empirically evaluate 1,942 candidate architectures on NVIDIA Jetson Orin.<n>Our architecture achieves 19.42% lower perplexity on WikiText-2.
arXiv Detail & Related papers (2026-02-10T23:51:00Z)
DUET: Agentic Design Understanding via Experimentation and Testing [6.787641711048685]
DUET is a general methodology for developing Design Understanding via Experimentation and Testing.<n>It iteratively generates hypotheses, tests them with EDA tools, and integrates the results to build a bottom-up understanding of the design.<n>We show that DUET improves AI agent performance on formal verification, when compared to a baseline flow without experimentation.
arXiv Detail & Related papers (2025-12-06T02:16:28Z)
LOCOFY Large Design Models -- Design to code conversion solution [0.0]
We introduce the Large Design Models paradigm specifically trained on designs and webpages to enable seamless conversion from design-to-code.<n>We have developed a training and inference pipeline by incorporating data engineering and appropriate model architecture modification.<n>Our models illustrated exceptional end-to-end design-to-code conversion accuracy using a novel preview match score metric.
arXiv Detail & Related papers (2025-07-22T03:54:57Z)
Towards LLM-based Root Cause Analysis of Hardware Design Failures [8.588085004917476]
Large language models (LLMs) can explain the root cause of design issues and bugs revealed during synthesis and simulation.<n>OpenAI's o3-mini reasoning model reached a correct determination 100% of the time under pass@5 scoring.
arXiv Detail & Related papers (2025-07-09T03:25:52Z)
Training Language Models to Generate Quality Code with Program Analysis Feedback [66.0854002147103]
Code generation with large language models (LLMs) is increasingly adopted in production but fails to ensure code quality.<n>We propose REAL, a reinforcement learning framework that incentivizes LLMs to generate production-quality code.
arXiv Detail & Related papers (2025-05-28T17:57:47Z)
SolBench: A Dataset and Benchmark for Evaluating Functional Correctness in Solidity Code Completion and Repair [51.0686873716938]
We introduce SolBench, a benchmark for evaluating the functional correctness of Solidity smart contracts generated by code completion models.<n>We propose a Retrieval-Augmented Code Repair framework to verify functional correctness of smart contracts.<n>Results show that code repair and retrieval techniques effectively enhance the correctness of smart contract completion while reducing computational costs.
arXiv Detail & Related papers (2025-03-03T01:55:20Z)
Code to Think, Think to Code: A Survey on Code-Enhanced Reasoning and Reasoning-Driven Code Intelligence in LLMs [53.00384299879513]
In large language models (LLMs), code and reasoning reinforce each other.<n>Code provides verifiable execution paths, enforces logical decomposition, and enables runtime validation.<n>We identify key challenges and propose future research directions to strengthen this synergy.
arXiv Detail & Related papers (2025-02-26T18:55:42Z)
Specifications: The missing link to making the development of LLM systems an engineering discipline [65.10077876035417]
We discuss the progress the field has made so far-through advances like structured outputs, process supervision, and test-time compute.<n>We outline several future directions for research to enable the development of modular and reliable LLM-based systems.
arXiv Detail & Related papers (2024-11-25T07:48:31Z)
VerilogReader: LLM-Aided Hardware Test Generation [5.012023213660125]
Large Language Model (LLM) with their advanced understanding and inference capabilities has introduced a novel approach. In this work, we investigate the integration of LLM into the Coverage Directed Test Generation (CDG) process. We compare our framework with random testing, using our self-designed Verilog benchmark suite.
arXiv Detail & Related papers (2024-06-03T07:20:51Z)
Evaluating LLMs for Hardware Design and Test [25.412044293834715]
Large Language Models (LLMs) have demonstrated capabilities for producing code in Hardware Description Languages (HDLs)<n>We examine the capabilities and limitations of the state-of-the-art conversational LLMs when producing Verilog for functional and verification purposes.
arXiv Detail & Related papers (2024-04-23T18:55:49Z)
Prompting Large Language Models to Tackle the Full Software Development Lifecycle: A Case Study [72.24266814625685]
We explore the performance of large language models (LLMs) across the entire software development lifecycle with DevEval.<n>DevEval features four programming languages, multiple domains, high-quality data collection, and carefully designed and verified metrics for each task.<n> Empirical studies show that current LLMs, including GPT-4, fail to solve the challenges presented within DevEval.
arXiv Detail & Related papers (2024-03-13T15:13:44Z)
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback [58.20547418182074]
We introduce StepCoder, a novel framework for code generation, consisting of two main components. CCCS addresses the exploration challenge by breaking the long sequences code generation task into a Curriculum of Code Completion Subtasks. FGO only optimize the model by masking the unexecuted code segments to provide Fine-Grained Optimization. Our method improves the ability to explore the output space and outperforms state-of-the-art approaches in corresponding benchmarks.
arXiv Detail & Related papers (2024-02-02T13:14:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.