Fugu-MT 論文翻訳(概要): QiMeng-CodeV-SVA: Training Specialized LLMs for Hardware Assertion Generation via RTL-Grounded Bidirectional Data Synthesis

論文の概要: QiMeng-CodeV-SVA: Training Specialized LLMs for Hardware Assertion Generation via RTL-Grounded Bidirectional Data Synthesis

arxiv url: http://arxiv.org/abs/2603.14239v1
Date: Sun, 15 Mar 2026 06:25:09 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-17 16:19:35.691416
Title: QiMeng-CodeV-SVA: Training Specialized LLMs for Hardware Assertion Generation via RTL-Grounded Bidirectional Data Synthesis
Title（参考訳）: QiMeng-CodeV-SVA:RTL周辺双方向データ合成によるハードウェア・アサーション・ジェネレーションのための訓練用LLM
Authors: Yutong Wu, Chenrui Cao, Pengwei Jin, Di Huang, Rui Zhang, Xishan Zhang, Zidong Du, Qi Guo, Xing Hu,
Abstract要約: 我々は,高品質な実世界のSVAコーパスの不足と,NL-SVAセマンティック等価性を決定するための信頼性の高い手法の欠如という,2つの課題に対処するデータ合成フレームワークを提案する。我々は,一連のSVA生成モデルであるCodeV-SVAを訓練する。特に,CodeV-SVAはNL2SVA-Humanで75.8%,NL2SVA-Machineで84.0%を達成した。
参考スコア（独自算出の注目度）: 41.09864485551356
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: SystemVerilog Assertions (SVAs) are crucial for hardware verification. Recent studies leverage general-purpose LLMs to translate natural language properties to SVAs (NL2SVA), but they perform poorly due to limited data. We propose a data synthesis framework to tackle two challenges: the scarcity of high-quality real-world SVA corpora and the lack of reliable methods to determine NL-SVA semantic equivalence. For the former, large-scale open-source RTLs are used to guide LLMs to generate real-world SVAs; for the latter, bidirectional translation serves as a data selection method. With the synthesized data, we train CodeV-SVA, a series of SVA generation models. Notably, CodeV-SVA-14B achieves 75.8% on NL2SVA-Human and 84.0% on NL2SVA-Machine in Func.@1, matching or exceeding advanced LLMs like GPT-5 and DeepSeek-R1.
Abstract（参考訳）: SystemVerilog Assertions (SVAs) はハードウェア検証に不可欠である。近年の研究では、汎用LLMを用いて自然言語特性をSVA(NL2SVA)に翻訳しているが、限られたデータのために性能が悪くなっている。高品質な実世界のSVAコーパスの不足と,NL-SVAセマンティック等価性を決定するための信頼性の高い手法の欠如という,2つの課題に対処するデータ合成フレームワークを提案する。前者の大規模オープンソースRTLは、LLMを誘導して現実世界のSVAを生成するのに使われ、後者は双方向翻訳がデータ選択方法として機能する。合成データを用いて、一連のSVA生成モデルであるCodeV-SVAを訓練する。特に、CodeV-SVA-14BはNL2SVA-Humanで75.8%、NL2SVA-Machine in Funcで84.0%に達する。 GPT-5やDeepSeek-R1のような高度なLCMをマッチングしたり、超えたりします。

論文の概要: QiMeng-CodeV-SVA: Training Specialized LLMs for Hardware Assertion Generation via RTL-Grounded Bidirectional Data Synthesis

関連論文リスト