Fugu-MT 論文翻訳(概要): GramTrans: A Better Code Representation Approach in Code Generation

論文の概要: GramTrans: A Better Code Representation Approach in Code Generation

arxiv url: http://arxiv.org/abs/2510.02887v1
Date: Fri, 03 Oct 2025 10:49:33 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-06 16:35:52.351286
Title: GramTrans: A Better Code Representation Approach in Code Generation
Title（参考訳）: GramTrans: コード生成におけるより良いコード表現アプローチ
Authors: Zhao Zhang, Qingyuan Liang, Zeyu Sun, Yizhou Chen, Guoqing Wang, Yican Sun, Lu Zhang, Ge Li, Yingfei Xiong,
Abstract要約: 本稿では,表現が解析し易いほど,モデルの性能が向上する,という予想を提案する。 LL(1)クラス内の表現に文脈自由言語を自動的に変換する一般的なアプローチであるGramTransを提案する。
参考スコア（独自算出の注目度）: 31.09799107794881
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Code generation has shown great promise in assisting software development. A fundamental yet underexplored question is how the choice of code representation affects model performance. While existing studies employ various representations, such as treating code as plain text, grammar rule sequences, or syntax tree sequences, they lack a principled understanding of the relationship between parsing difficulty and model effectiveness. This paper proposes a conjecture: the easier a representation is to parse, the better performance the model achieves. We formalize this idea using grammar classes, where representations in simpler classes (e.g., LL(1)) are easier to parse. Through a controlled experiment on a Python-based DSL, we show that parsing difficulty strongly correlates with model performance. Motivated by this finding, we present GramTrans, a general approach that automatically transforms a context-free language into a representation within the LL(1) class. GramTrans introduces a novel hierarchical conflict elimination algorithm, enabling a flexible trade-off between syntactic simplicity and token efficiency. We evaluate GramTrans on both Python and Java using three code generation models: StarCoder 1B, DeepSeek-Coder 1.3B, and Qwen2.5 1.5B. Across multiple benchmarks, GramTrans consistently delivers significant improvements over baseline representations. Furthermore, our analysis of existing representations reconfirms the strong alignment between parsing difficulty and model performance, providing additional support for the conjecture.
Abstract（参考訳）: コード生成はソフトウェア開発を支援する上で大きな可能性を秘めている。根本的な未調査の問題は、コード表現の選択がモデルのパフォーマンスにどのように影響するかである。既存の研究では、コードをプレーンテキスト、文法規則シーケンス、構文木シーケンスとして扱うなど、さまざまな表現が採用されているが、解析の難しさとモデルの有効性の関係について、原則的に理解されていない。本稿では,表現が解析し易いほど,モデルの性能が向上する,という予想を提案する。より単純なクラス(例えば LL(1))での表現が解析し易い文法クラスを使ってこの考えを定式化する。 Python ベースの DSL 上での制御実験により,解析の難しさがモデル性能と強く相関していることを示す。この発見に動機づけられたGramTransは、文脈自由言語をLL(1)クラス内の表現に自動的に変換する一般的なアプローチである。 GramTransは、構文的単純さとトークン効率の間の柔軟なトレードオフを可能にする、新しい階層的な競合排除アルゴリズムを導入している。我々は,3つのコード生成モデル,StarCoder 1B,DeepSeek-Coder 1.3B,Qwen2.5 1.5Bを用いて,PythonおよびJava上でGramTransを評価する。複数のベンチマークで、GramTransは一貫してベースライン表現よりも大幅に改善されている。さらに,既存の表現を解析することにより,解析難易度とモデル性能との強い整合性を再確認し,予測のさらなる支持を提供する。

論文の概要: GramTrans: A Better Code Representation Approach in Code Generation

関連論文リスト