Related papers: MeltRTL: Multi-Expert LLMs with Inference-time Intervention for RTL Code Generation

MeltRTL: Multi-Expert LLMs with Inference-time Intervention for RTL Code Generation

URL: http://arxiv.org/abs/2601.13015v1
Date: Mon, 19 Jan 2026 12:49:39 GMT
Title: MeltRTL: Multi-Expert LLMs with Inference-time Intervention for RTL Code Generation
Authors: Nowfel Mashnoor, Mohammad Akyash, Hadi Kamali, Kimia Azar,
Abstract summary: MeltRTL is a novel framework that integrates multi-expert attention with inference-time intervention.<n>MeltRTL significantly improves the accuracy of large language models (LLMs) without retraining the base model.<n>We evaluate MeltRTL on the VerilogEval benchmark, achieving 96% synthesizability and 60% functional correctness.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The automated generation of hardware register-transfer level (RTL) code with large language models (LLMs) shows promise, yet current solutions struggle to produce syntactically and functionally correct code for complex digital designs. This paper introduces MeltRTL, a novel framework that integrates multi-expert attention with inference-time intervention (ITI) to significantly improve LLM-based RTL code generation accuracy without retraining the base model. MeltRTL introduces three key innovations: (1) A multi-expert attention architecture that dynamically routes design specifications to specialized expert networks, enabling targeted reasoning across various hardware categories; (2) An inference-time intervention mechanism that employs non-linear probes to detect and correct hardware-specific inaccuracies during generation; and (3) An efficient intervention framework that selectively operates on expert-specific attention heads with minimal computational overhead. We evaluate MeltRTL on the VerilogEval benchmark, achieving 96% synthesizability and 60% functional correctness, compared to the base LLM's 85.3% and 45.3%, respectively. These improvements are obtained entirely at inference time, with only 27% computational overhead and no model fine-tuning, making MeltRTL immediately deployable on existing pre-trained LLMs. Ablation studies further show the complementary benefits of multi-expert architecture and ITI, highlighting their synergistic effects when combined.

Related papers

ACE-RTL: When Agentic Context Evolution Meets RTL-Specialized LLMs [12.204779627626273]
ACE-RTL integrates an RTL-specialized LLM, trained on a large-scale dataset of 1.7 million RTL samples.<n>On the Comprehensive Verilog Design Problems (CVDP) benchmark, ACE-RTL achieves up to a 44.87% pass rate improvement over 14 competitive baselines.
arXiv Detail & Related papers (2026-02-10T19:09:13Z)
Run, Ruminate, and Regulate: A Dual-process Thinking System for Vision-and-Language Navigation [52.11339614452127]
Vision-and-Language Navigation (VLN) requires an agent to dynamically explore complex 3D environments following human instructions.<n>Recent research underscores the potential of harnessing large language models (LLMs) for VLN, given their commonsense knowledge and general reasoning capabilities.<n>We propose a novel dual-process thinking framework dubbed R3, integrating LLMs' generalization capabilities with VLN-specific expertise in a zero-shot manner.
arXiv Detail & Related papers (2025-11-18T04:32:00Z)
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use [78.29315418819074]
We introduce VerlTool, a unified and modular framework that addresses limitations through systematic design principles.<n>Our framework formalizes ARLT as multi-turn trajectories with multi-modal observation tokens (text/image/video), extending beyond single-turn RLVR paradigms.<n>The modular plugin architecture enables rapid tool integration requiring only lightweight Python definitions.
arXiv Detail & Related papers (2025-09-01T01:45:18Z)
Agentic Reinforced Policy Optimization [66.96989268893932]
Large-scale reinforcement learning with verifiable rewards (RLVR) has demonstrated its effectiveness in harnessing the potential of large language models (LLMs) for single-turn reasoning tasks.<n>Current RL algorithms inadequately balance the models' intrinsic long-horizon reasoning capabilities and their proficiency in multi-turn tool interactions.<n>We propose Agentic Reinforced Policy Optimization (ARPO), a novel agentic RL algorithm tailored for training multi-turn LLM-based agents.
arXiv Detail & Related papers (2025-07-26T07:53:11Z)
VeriOpt: PPA-Aware High-Quality Verilog Generation via Multi-Role LLMs [41.94295877935867]
VeriOpt is a novel framework that leverages role-based prompting and PPA-aware optimization to produce high-quality, synthesizable Verilog.<n>Our work advances the state-of-the-art AI-driven hardware design by addressing the critical gap between correctness and quality.
arXiv Detail & Related papers (2025-07-20T00:28:55Z)
R-Sparse: Rank-Aware Activation Sparsity for Efficient LLM Inference [77.47238561728459]
R-Sparse is a training-free activation sparsity approach capable of achieving high sparsity levels in advanced LLMs.<n> Experiments on Llama-2/3 and Mistral models across ten diverse tasks demonstrate that R-Sparse achieves comparable performance at 50% model-level sparsity.
arXiv Detail & Related papers (2025-04-28T03:30:32Z)
TuRTLe: A Unified Evaluation of LLMs for RTL Generation [0.6010802600885173]
We propose TuRTLe, a unified evaluation framework designed to assess LLMs across key RTL generation tasks.<n>We benchmark a diverse set of open LLMs and analyze their strengths and weaknesses in EDA-specific tasks.<n>Our results show that reasoning-based models, such as DeepSeek R1, consistently outperform others across multiple evaluation criteria.
arXiv Detail & Related papers (2025-03-31T07:43:12Z)
LLM2: Let Large Language Models Harness System 2 Reasoning [65.89293674479907]
Large language models (LLMs) have exhibited impressive capabilities across a myriad of tasks, yet they occasionally yield undesirable outputs.<n>We introduce LLM2, a novel framework that combines an LLM with a process-based verifier.<n>LLMs2 is responsible for generating plausible candidates, while the verifier provides timely process-based feedback to distinguish desirable and undesirable outputs.
arXiv Detail & Related papers (2024-12-29T06:32:36Z)
Dspy-based Neural-Symbolic Pipeline to Enhance Spatial Reasoning in LLMs [29.735465300269993]
Large Language Models (LLMs) have demonstrated remarkable capabilities across various tasks, yet they often struggle with spatial reasoning.<n>This paper presents a novel neural-symbolic framework that enhances LLMs' spatial reasoning abilities through iterative feedback between LLMs and Answer Set Programming (ASP)<n>We evaluate our approach on two benchmark datasets: StepGame and SparQA.
arXiv Detail & Related papers (2024-11-27T18:04:05Z)
AIvril: AI-Driven RTL Generation With Verification In-The-Loop [0.7831852829409273]
Large Language Models (LLMs) are computational models capable of performing complex natural language processing tasks. This paper introduces AIvril, a framework designed to enhance the accuracy and reliability of RTL-aware LLMs.
arXiv Detail & Related papers (2024-09-03T15:07:11Z)
FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models [50.331708897857574]
We introduce FactorLLM, a novel approach that decomposes well-trained dense FFNs into sparse sub-networks without requiring any further modifications. FactorLLM achieves comparable performance to the source model securing up to 85% model performance while obtaining over a 30% increase in inference speed.
arXiv Detail & Related papers (2024-08-15T16:45:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.