Compiler Bugs Detection in Logic Synthesis Tools via Linear Upper Confidence Bound
- URL: http://arxiv.org/abs/2509.01149v1
- Date: Mon, 01 Sep 2025 05:54:48 GMT
- Title: Compiler Bugs Detection in Logic Synthesis Tools via Linear Upper Confidence Bound
- Authors: Hui Zeng, Zhihao Xu, Hui Li, Siwen Wang, Qian Ma,
- Abstract summary: Lin-Hunter is a novel testing framework designed to enhance the diversity of HDL test cases and the efficiency of FPGA logic synthesis tool validation.<n>Our method has discovered 18 unique bugs, including 10 previously unreported defects, which have been confirmed by official developers.
- Score: 11.123007674634936
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Field-Programmable Gate Arrays (FPGAs) play an indispensable role in Electronic Design Automation (EDA), translating Register-Transfer Level (RTL) designs into gate-level netlists. The correctness and reliability of FPGA logic synthesis tools are critically important, as unnoticed bugs in these tools may infect the final hardware implementations. However, recent approaches often rely heavily on random selection strategies, limiting the structural diversity of the generated HDL test cases and resulting in inadequate exploration of the tool's feature space. To address this limitation, we propose Lin-Hunter, a novel testing framework designed to systematically enhance the diversity of HDL test cases and the efficiency of FPGA logic synthesis tool validation. Specifically, Lin-Hunter introduces a principled set of metamorphic transformation rules to generate functionally equivalent yet structurally diverse HDL test case variants, effectively addressing the limited diversity of existing test inputs. To further enhance bug discovery efficiency, Lin-Hunter integrates an adaptive strategy selection mechanism based on the Linear Upper Confidence Bound (LinUCB) method. This method leverages feedback from synthesis logs of previously executed test cases to dynamically prioritize transformation strategies that have empirically demonstrated a higher likelihood of triggering synthesis bugs. Comprehensive experiments conducted over a three-month period demonstrate the practical effectiveness of Lin-Hunter. Our method has discovered 18 unique bugs, including 10 previously unreported defects, which have been confirmed by official developers. Moreover, our method outperforms state-of-the-art testing methods in both test-case diversity and bug-discovery efficiency.
Related papers
- CVeDRL: An Efficient Code Verifier via Difficulty-aware Reinforcement Learning [57.24524263804788]
Code verifiers play a critical role in post-verification for LLM-based code generation.<n>Existing supervised fine-tuning methods suffer from data scarcity, high failure rates, and poor inference efficiency.<n>We show that naive RL with only functionality rewards fails to generate effective unit tests for difficult branches and samples.
arXiv Detail & Related papers (2026-01-30T10:33:29Z) - Dynamic Stability of LLM-Generated Code [6.120340803716395]
Current evaluations of LLMs for code generation overlook the fact that functionally correct solutions can differ significantly in algorithmic complexity.<n>We introduce a principled framework for evaluating the dynamic stability of generated code.<n>Our findings call for stability-aware objectives in code generation and new benchmarks with test cases for robust, real-world evaluation.
arXiv Detail & Related papers (2025-11-07T09:58:06Z) - SSVD: Structured SVD for Parameter-Efficient Fine-Tuning and Benchmarking under Domain Shift in ASR [65.90944188787786]
Low-rank adaptation (LoRA) is widely used in speech applications, but its state-of-the-art variants, e.g., VeRA, DoRA, PiSSA, and SVFT, are developed mainly for language and vision tasks, with limited validation in speech.<n>This work presents the first comprehensive integration and benchmarking of these PEFT methods within ESPnet.<n>We evaluate all methods on domain-shifted speech recognition tasks, including child speech and dialectal variation, across model scales from 0.1B to 2B.
arXiv Detail & Related papers (2025-09-02T20:51:17Z) - Code Difference Guided Fuzzing for FPGA Logic Synthesis Compilers via Bayesian Optimization [8.52837330241478]
We propose a guided mutation strategy based on Bayesian optimization called LSC-Fuzz to detect bugs in FPGA logic synthesis compilers.<n>Through three months, LSC-Fuzz has found 16 bugs, 12 of these has been confirmed by official technical support.
arXiv Detail & Related papers (2025-08-25T06:41:36Z) - A Novel Mutation Based Method for Detecting FPGA Logic Synthesis Tool Bugs [7.8865444084780965]
We propose VERMEI, a new method for testing FPGA logic synthesis tools.<n> VERMEI consists of three modules: preprocessing, equivalent mutation, and bug identification.<n>Within five months, VERMEI reported 15 bugs to vendors, 9 of which were confirmed as new.
arXiv Detail & Related papers (2025-08-21T13:11:59Z) - Training Language Models to Generate Quality Code with Program Analysis Feedback [66.0854002147103]
Code generation with large language models (LLMs) is increasingly adopted in production but fails to ensure code quality.<n>We propose REAL, a reinforcement learning framework that incentivizes LLMs to generate production-quality code.
arXiv Detail & Related papers (2025-05-28T17:57:47Z) - Teaching Your Models to Understand Code via Focal Preference Alignment [70.71693365502212]
In existing approaches, a set of n candidate solutions is evaluated based on test case success rates.<n>Because this approach aligns entire failing code blocks rather than pinpointing specific errors, it lacks the granularity necessary to capture meaningful error-correction relationships.<n>We propose Target-DPO, a new preference alignment framework that mimics human iterative debug to refine Code LLMs.
arXiv Detail & Related papers (2025-03-04T16:56:34Z) - Scoring Verifiers: Evaluating Synthetic Verification for Code and Reasoning [59.25951947621526]
We propose an approach which can transform existing coding benchmarks into scoring and ranking datasets to evaluate the effectiveness of synthetic verifiers.<n>We release four new benchmarks (HE-R, HE-R+, MBPP-R, and MBPP-R+), and analyzed synthetic verification methods with standard, reasoning-based, and reward-based LLMs.<n>Our experiments show that reasoning can significantly improve test case generation and that scaling the number of test cases enhances the verification accuracy.
arXiv Detail & Related papers (2025-02-19T15:32:11Z) - Automated Refactoring of Non-Idiomatic Python Code: A Differentiated Replication with LLMs [54.309127753635366]
We present the results of a replication study in which we investigate GPT-4 effectiveness in recommending and suggesting idiomatic actions.<n>Our findings underscore the potential of LLMs to achieve tasks where, in the past, implementing recommenders based on complex code analyses was required.
arXiv Detail & Related papers (2025-01-28T15:41:54Z) - Toward Automated Validation of Language Model Synthesized Test Cases using Semantic Entropy [0.5057850174013127]
Modern Large Language Model (LLM)-based programming agents often rely on test execution feedback to refine their generated code.<n>This paper introduces VALTEST, a novel framework that leverages semantic entropy to automatically validate test cases generated by LLMs.<n>Experiments show that VALTEST boosts test validity by up to 29% and improves code generation performance, as evidenced by significant increases in pass@1 scores.
arXiv Detail & Related papers (2024-11-13T00:07:32Z) - Exploring and Lifting the Robustness of LLM-powered Automated Program Repair with Metamorphic Testing [31.327835928133535]
Large language model-powered Automated Program Repair (LAPR) techniques have achieved state-of-the-art bug-fixing performance.<n>It is crucial to conduct robustness testing on LAPR techniques before their practical deployment.<n>We propose MT-LAPR, a Metamorphic Testing framework exclusively for LAPR techniques.
arXiv Detail & Related papers (2024-10-10T01:14:58Z) - End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes [52.818579746354665]
This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures.
We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data.
arXiv Detail & Related papers (2023-05-25T10:58:46Z) - K-ASTRO: Structure-Aware Adaptation of LLMs for Code Vulnerability Detection [12.458619777971956]
K-ASTRO is a lightweight Transformer model that combines semantic embeddings from Large Language Models with structural features of Abstract Syntax Trees (ASTs) to improve efficiency and accuracy in code vulnerability detection.<n>Our approach introduces an AST-based augmentation technique inspired by mutation testing, a structure-aware attention mechanism that incorporates augmented AST features, and a joint adaptation pipeline to unify code semantics and syntax.
arXiv Detail & Related papers (2022-08-17T04:50:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.