Fugu-MT 論文翻訳(概要): SELF-REDRAFT: Eliciting Intrinsic Exploration-Exploitation Balance in Test-Time Scaling for Code Generation

論文の概要: SELF-REDRAFT: Eliciting Intrinsic Exploration-Exploitation Balance in Test-Time Scaling for Code Generation

arxiv url: http://arxiv.org/abs/2511.02854v1
Date: Fri, 31 Oct 2025 06:30:47 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-06 18:19:32.17069
Title: SELF-REDRAFT: Eliciting Intrinsic Exploration-Exploitation Balance in Test-Time Scaling for Code Generation
Title（参考訳）: SELF-REDRAFT:コード生成のためのテスト時間スケーリングにおける本質的な探索-探索バランスの緩和
Authors: Yixiang Chen, Tianshi Zheng, Shijue Huang, Zhitao He, Yi R. Fung,
Abstract要約: インタプリタのフィードバックなしにテスト時のスケーリングは、現実のコード生成シナリオに不可欠である。 SELF-REDRAFTはSelf-Refine上に構築されたフレームワークで、根本的な欠陥のあるソリューションのための新しいドラフトの提案を促進する。
参考スコア（独自算出の注目度）: 12.241337921137259
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Test-time scaling without interpreter feedback is essential for real-world code generation scenarios where test cases are not readily available. While existing paradigms often rely on either greedy exploitation (i.e., iterative refinement) or stochastic exploration (i.e., relying on sample-based voting or reranking mechanisms), the balance between these two dimensions remains underexplored. To investigate the LLM's intrinsic ability to balance exploitation and exploration, we introduce SELF-REDRAFT, a framework built upon Self-Refine that encourages the model to propose new drafts for solutions that are fundamentally flawed. Our results show that SELF-REDRAFT consistently achieves better performance than Self-Refine when converged under the same maximum number of iterations. Still, we observe that significant room for improvement remains, largely due to two core aspects of current self-redraft capabilities: constrained capacity for generating instructive feedback and fragile discriminative judgment. We also find that balancing strategies vary notably across different LLMs, reflecting distinct, model-specific behaviors. Overall, our study establishes a baseline for intrinsic exploration-exploitation balancing in test-time scaling and identifies feedback and discrimination as key areas with potential for future advances.
Abstract（参考訳）: インタプリタのフィードバックなしにテスト時のスケーリングは、テストケースが簡単に利用できない実世界のコード生成シナリオに不可欠である。既存のパラダイムは、しばしば欲求的な搾取(すなわち反復的な洗練)または確率的な探索(すなわち、サンプルベースの投票や再配置機構に依存する)に頼っているが、これらの2つの次元のバランスはいまだに解明されていない。 LLMの本質的な利用と探索のバランスをとる能力を調べるために,Self-REDRAFTを導入する。Self-REDRAFTはSelf-Refine上に構築されたフレームワークで,基本的な欠陥のあるソリューションに対する新たなドラフトの提案を促す。この結果から,SELF-REDRAFTは,同じイテレーション数で収束した場合,自己修正よりも優れた性能が得られることがわかった。それでも、現在の自己退行能力の2つの中核的な側面、すなわち、インストラクティブフィードバックを生成するための制限された能力と、脆弱な差別的判断のため、改善の余地は依然として残っています。また、バランス戦略は、異なるモデル固有の振る舞いを反映して、異なるLLM間で顕著に異なることが分かる。本研究は,テストタイムスケーリングにおける本質的な探索・探索バランスのベースラインを確立し,今後の進歩の可能性を秘めた重要な領域として,フィードバックと識別を同定する。

論文の概要: SELF-REDRAFT: Eliciting Intrinsic Exploration-Exploitation Balance in Test-Time Scaling for Code Generation

関連論文リスト