Fugu-MT 論文翻訳(概要): ScaleBox: Enabling High-Fidelity and Scalable Code Verification for Large Language Models

論文の概要: ScaleBox: Enabling High-Fidelity and Scalable Code Verification for Large Language Models

arxiv url: http://arxiv.org/abs/2604.27467v1
Date: Thu, 30 Apr 2026 06:09:17 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-01 16:31:53.948601
Title: ScaleBox: Enabling High-Fidelity and Scalable Code Verification for Large Language Models
Title（参考訳）: ScaleBox: 大規模言語モデルのための高忠実でスケーラブルなコード検証を実現する
Authors: Jiasheng Zheng, Xin Zheng, Boxi Cao, Pengbo Wang, Zhengzhao Ma, Qiming Zhu, Jiazhen Jiang, Yaojie Lu, Hongyu Lin, Xianpei Han, Le Sun,
Abstract要約: コードサンドボックスは、大規模言語モデルのコーディング能力を向上するための重要な基盤として登場した。既存のシステムは、高精度なワークロード下で正確な検証と効率を提供することができない。大規模なコードトレーニングにおいてこれらの制限に対処するために設計された,高忠実でスケーラブルなシステムであるScaleBoxを紹介する。
参考スコア（独自算出の注目度）: 65.56970356058655
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Code sandboxes have emerged as a critical infrastructure for advancing the coding capabilities of large language models, providing verifiable feedback for both RL training and evaluation. However, existing systems fail to provide accurate verification and efficiency under high-concurrency workloads. We present ScaleBox, a high-fidelity and scalable system designed to address these limitations in large-scale code training. ScaleBox introduces automated special-judge generation and management, fine-grained parallel execution across test cases with seamless multi-node coordination, and a configuration-driven evaluation suite for reproducible benchmarking. A series of experiments demonstrates that ScaleBox significantly enhances code verification accuracy and efficiency. Our further RLVR experiments show that ScaleBox substantially improves both performance on LiveCodeBench and training stability, significantly outperforming heuristic-matching baselines. By providing a reliable and high-throughput infrastructure, ScaleBox facilitates more effective research and development in large-scale code training.
Abstract（参考訳）: コードサンドボックスは、大規模な言語モデルのコーディング能力を向上するための重要な基盤として現れ、RLトレーニングと評価の両方に検証可能なフィードバックを提供している。しかし、既存のシステムは高精度なワークロード下で正確な検証と効率を提供していない。大規模なコードトレーニングにおいてこれらの制限に対処するために設計された,高忠実でスケーラブルなシステムであるScaleBoxを紹介する。 ScaleBoxは、自動の特殊タスク生成と管理、シームレスなマルチノード調整を伴うテストケース間のきめ細かい並列実行、再現可能なベンチマークのための構成駆動評価スイートを導入している。一連の実験は、ScaleBoxがコード検証の精度と効率を大幅に向上させることを示した。 RLVRのさらなる実験により、ScaleBoxはLiveCodeBenchのパフォーマンスとトレーニングの安定性の両方を大幅に改善し、ヒューリスティックマッチングベースラインを著しく上回っていることがわかった。信頼性と高スループットのインフラストラクチャを提供することで、ScaleBoxは大規模なコードトレーニングにおいて、より効果的な研究と開発を支援します。

論文の概要: ScaleBox: Enabling High-Fidelity and Scalable Code Verification for Large Language Models

関連論文リスト