Fugu-MT 論文翻訳(概要): LLM Translation of Compiler Intermediate Representation

論文の概要: LLM Translation of Compiler Intermediate Representation

arxiv url: http://arxiv.org/abs/2605.08247v1
Date: Thu, 07 May 2026 13:22:23 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-12 23:28:49.49518
Title: LLM Translation of Compiler Intermediate Representation
Title（参考訳）: コンパイラ中間表現のLLM翻訳
Authors: Andrea Valenzuela Ramirez, Cristian Gutierrez-Gomez, Marta Barroso, Dario Garcia-Gasulla, Sara Royuela,
Abstract要約: 本稿では,GIMPLE を LLVM IR に変換するための変換器モデル IRIS-14B を提案する。我々の知る限りでは、IRIS-14BはIR-to-IR翻訳のために明示的に訓練された最初のモデルである。現在入手可能な最先端のオープンモデルを含む、広く使われているモデルの精度は、13～1000億のパラメータから44ポイントまで向上している。
参考スコア（独自算出の注目度）: 2.614444430580024
License: http://creativecommons.org/licenses/by/4.0/
Abstract: GCC and LLVM underpin much of modern software infrastructure, relying on distinct Intermediate Representations (IRs) to drive optimizations and code generation. However, the semantic and structural differences between these IRs create significant barriers for cross-toolchain interaction, limiting the reuse of compiler frontends, backends, and optimization pipelines across programming languages and compilation ecosystems. Traditional rule-based translators have attempted to bridge this gap, but their complexity and maintenance cost have hindered practical adoption. In this context, Large Language Models (LLMs) appear to be an emerging technology that offers a data-driven alternative, capable of learning complex mappings between heterogeneous compiler IRs directly from sufficiently representative examples. To explore this approach, this paper presents IRIS-14B, a 14-billion-parameter transformer model fine-tuned to translate GIMPLE (as emitted by GCC) to LLVM IR (as emitted by LLVM). The model is trained on paired IRs extracted from C sources and evaluated on the GIMPLE-to-LLVM IR transformation applied to IRs derived from real-world C code and competitive programming problems. To the best of our knowledge, IRIS-14B is the first model trained explicitly for IR-to-IR translation. It outperforms the accuracy of widely used models, including the largest state-of-the-art open models available today, ranging from 13 to 1,000 billion parameters, by up to 44 percentage points. The proposed transformation supports the integration of LLMs as complementary components within hybrid neuro-symbolic compiler architectures, where models such as IRIS-14B act as interoperability layers enabling cross-toolchain workflows without modifying existing compiler passes, while traditional compiler infrastructure continues to perform deterministic compilation and optimization.
Abstract（参考訳）: GCCとLLVMは、最適化とコード生成を促進するために、異なる中間表現(IR)に依存して、現代のソフトウェア基盤の多くを支えている。しかし、これらのIR間の意味的および構造的差異は、プログラミング言語やコンパイルエコシステム間のコンパイラフロントエンド、バックエンド、最適化パイプラインの再利用を制限する、クロスツールチェーンの相互作用に重大な障壁をもたらす。従来のルールベースの翻訳者は、このギャップを埋めようと試みてきたが、その複雑さとメンテナンスコストにより、実践的な採用が妨げられている。この文脈では、LLM(Large Language Models)は、データ駆動の代替手段を提供する新興技術であり、十分な代表例から異種コンパイラIR間の複雑なマッピングを直接学習することができる。本稿では, GIMPLE を LLVM IR に変換するための 14 ビリオンパラメトリックトランスフォーマモデル IRIS-14B を提案する。このモデルは、実世界のCコードと競合プログラミング問題から派生したIRに適用されたGIMPLE-to-LLVM IR変換に基づいて、Cソースから抽出したペアIRに基づいて訓練され、評価される。我々の知る限りでは、IRIS-14BはIR-to-IR翻訳のために明示的に訓練された最初のモデルである。現在入手可能な最先端のオープンモデルを含む、広く使われているモデルの精度は、13億から1000億のパラメータから44ポイントまで向上している。 IRIS-14Bのようなモデルは、既存のコンパイラパスを変更することなく、クロスツールチェーンワークフローを実現する相互運用性レイヤとして機能し、従来のコンパイラインフラストラクチャは決定論的コンパイルと最適化を継続する。

論文の概要: LLM Translation of Compiler Intermediate Representation

関連論文リスト