Fugu-MT 論文翻訳(概要): DAJ: Data-Reweighted LLM Judge for Test-Time Scaling in Code Generation

論文の概要: DAJ: Data-Reweighted LLM Judge for Test-Time Scaling in Code Generation

arxiv url: http://arxiv.org/abs/2601.22230v1
Date: Thu, 29 Jan 2026 19:04:24 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-02 18:28:15.012219
Title: DAJ: Data-Reweighted LLM Judge for Test-Time Scaling in Code Generation
Title（参考訳）: DAJ: コード生成におけるテスト時間スケーリングのためのデータリフレッシュLDM判定
Authors: Peijia Qin, Ruiyi Zhang, Qi Cao, Pengtao Xie,
Abstract要約: DAJは,2段階のデータ重み付け学習フレームワークを用いて報酬を訓練した推論に基づくLLM判定器である。提案手法は,手作り検証に頼らずに,困難問題,分布内サンプル,軌跡整列データを自動的に強調する。
参考スコア（独自算出の注目度）: 30.131052926559956
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Test-time scaling for code generation commonly relies on Best-of-N selection, in which multiple candidate solutions are sampled from a base model, and the best one is selected by an LLM judge. However, training reliable LLM judges is challenging due to severe distribution shifts, including imbalances between easy and hard problems, mismatches between training tasks and evaluation benchmarks, and trajectory mismatch arising from training data generated by cheaper models whose behavior differs from that of inference-time models. We propose DAJ, a reasoning-based LLM judge trained with verifiable rewards under a bi-level data-reweighted learning framework. The proposed framework learns data-importance weights (either domain-level or instance-level) to optimize generalization performance on a held-out meta set aligned with target benchmarks. To the best of our knowledge, this is the first application of data reweighting to LLM-as-a-Judge training for test-time scaling. Our approach automatically emphasizes hard problems, in-distribution samples, and trajectory-aligned data, without relying on hand-crafted heuristics. Empirically, DAJ achieves state-of-the-art performance on LiveCodeBench and BigCodeBench, outperforming strong test-time scaling baselines as well as leading proprietary models.
Abstract（参考訳）: コード生成のためのテストタイムスケーリングは、一般的にBest-of-Nの選択に依存しており、ベースモデルから複数の候補ソリューションをサンプリングし、最高のものをLLM判事が選択する。しかし、簡単な問題と難しい問題間の不均衡、トレーニングタスクと評価ベンチマークのミスマッチ、推論時間モデルとは異なる振る舞いを持つより安価なモデルによって生成されたトレーニングデータから生じる軌道ミスマッチなど、信頼性の高いLCM審査員のトレーニングは難しい。本稿では,2段階のデータ重み付け学習フレームワークの下で,検証可能な報酬を訓練した推論に基づくLLM判定器であるDAJを提案する。提案フレームワークは、データ重要度(ドメインレベルまたはインスタンスレベル)を学習し、ターゲットベンチマークに整合した保持されたメタセット上での一般化性能を最適化する。私たちの知る限りでは、テスト時間スケーリングのためのLSM-as-a-Judgeトレーニングにデータリウェイトを適用したのはこれが初めてです。本手法では,手作りヒューリスティックスに頼らずに,難しい問題,分布内サンプル,軌跡整列データを自動的に強調する。実証的には、DAJはLiveCodeBenchとBigCodeBenchの最先端のパフォーマンスを達成し、強力なテストタイムスケーリングベースラインとプロプライエタリなモデルよりも優れています。

論文の概要: DAJ: Data-Reweighted LLM Judge for Test-Time Scaling in Code Generation

関連論文リスト