Fugu-MT 論文翻訳(概要): Intent2Tx: Benchmarking LLMs for Translating Natural Language Intents into Ethereum Transactions

論文の概要: Intent2Tx: Benchmarking LLMs for Translating Natural Language Intents into Ethereum Transactions

arxiv url: http://arxiv.org/abs/2604.27763v1
Date: Thu, 30 Apr 2026 11:52:50 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-01 16:31:54.076601
Title: Intent2Tx: Benchmarking LLMs for Translating Natural Language Intents into Ethereum Transactions
Title（参考訳）: Intent2Tx: 自然言語インテントをEthereumトランザクションに変換するためのLLMのベンチマーク
Authors: Zhuoran Pan, Yue Li, Zhi Guan, Jianbin Hu, Zhong Chen,
Abstract要約: textscIntent2Txは、11のカテゴリにわたる現実世界のプロトコルインタラクションにおいて、自然言語の意図を基礎にしている。 textscIntent2Txは、意図中心のWeb3エコシステムにおいて、自律的で信頼性の高いエージェントを開発するための重要な基盤となっている。
参考スコア（独自算出の注目度）: 6.606052122056915
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The emergence of Large Language Models (LLMs) offers a transformative interface for Web3, yet existing benchmarks fail to capture the complexity of translating high-level user intents into functionally correct, state-dependent on-chain transactions. We present \textsc{Intent2Tx}, a high-fidelity benchmark featuring 29,921 single-step and 1,575 multi-step instances meticulously derived from 300 days of real-world Ethereum mainnet traces. Unlike prior works that rely on synthetic instructions, \textsc{Intent2Tx} grounds natural language intents in real-world protocol interactions across 11 categories, including diverse long-tail Decentralized Finance (DeFi) primitives. To enable rigorous evaluation, we propose an execution-aware framework that transcends surface-level text matching by employing differential state analysis on forked mainnet environments. Our extensive evaluation of 16 state-of-the-art LLMs reveals that while scaling and retrieval-augmentation enhance logical consistency and parameter precision, current models struggle with out-of-distribution generalization and multi-step planning. Crucially, our execution-based analysis demonstrates that syntactically valid outputs often fail to achieve intended state transitions, highlighting a significant gap in current "reasoning-to-execution" capabilities. \textsc{Intent2Tx} serves as a critical foundation for developing autonomous, reliable agents in intent-centric Web3 ecosystems. Code and data: https://anonymous.4open.science/r/Intent2Tx_Bench-97FF .
Abstract（参考訳）: 大規模言語モデル(LLM)の出現は、Web3のトランスフォーメーションインターフェースを提供するが、既存のベンチマークは、高レベルのユーザインテントを機能的に正しい状態依存のオンチェーントランザクションに変換する複雑さを捉えていない。 29,921個のシングルステップと1,575個のマルチステップインスタンスを備えた高忠実度ベンチマークである‘textsc{Intent2Tx} を実世界のEthereumメインネットトレース300日分から巧みに抽出した。合成命令に依存する以前の研究とは異なり、 \textsc{Intent2Tx} は、さまざまな長尾分散ファイナンス(DeFi)プリミティブを含む11のカテゴリにわたる現実世界のプロトコル相互作用において、自然言語の意図を基盤としている。厳密な評価を可能にするために、フォークされたメインネット環境上での差分状態解析を用いて、表面レベルのテキストマッチングを超越する実行対応フレームワークを提案する。 16の最先端LCMを広範囲に評価した結果,拡張と探索により論理的整合性やパラメータの精度が向上する一方,現在のモデルでは分布外一般化と多段階計画に苦慮していることが明らかとなった。重要なことは、我々の実行ベースの分析は、構文的に有効なアウトプットが意図した状態遷移を達成できないことがしばしばあり、現在の"推論から実行"能力の重大なギャップを浮き彫りにしている。 \textsc{Intent2Tx}は、意図中心のWeb3エコシステムにおいて、自律的で信頼性の高いエージェントを開発するための重要な基盤となっている。コードとデータ:https://anonymous.4open.science/r/Intent2Tx_Bench-97FF

論文の概要: Intent2Tx: Benchmarking LLMs for Translating Natural Language Intents into Ethereum Transactions

関連論文リスト