Fugu-MT 論文翻訳(概要): GASP: Guided Asymmetric Self-Play For Coding LLMs

論文の概要: GASP: Guided Asymmetric Self-Play For Coding LLMs

arxiv url: http://arxiv.org/abs/2603.15957v1
Date: Mon, 16 Mar 2026 22:13:19 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-18 17:42:07.014402
Title: GASP: Guided Asymmetric Self-Play For Coding LLMs
Title（参考訳）: GASP:コーディングLLMのためのガイド付き非対称セルフプレイ
Authors: Swadesh Jana, Cansu Sancaktar, Tomáš Daniš, Georg Martius, Antonio Orvieto, Pavel Kolev,
Abstract要約: 非対称なセルフプレイは、大規模言語モデルの訓練後において有望なパラダイムとして現れている。本稿では,リアルタイムなゴールポスト質問によるグラウンド化を実現するためのガイド付き非対称セルフプレイ(GASP)を提案する。 We improve pass@20 on LiveCodeBench (LCB) by 2.5% by unguided asymmetric self-play。
参考スコア（独自算出の注目度）: 37.79170066221302
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Asymmetric self-play has emerged as a promising paradigm for post-training large language models, where a teacher continually generates questions for a student to solve at the edge of the student's learnability. Although these methods promise open-ended data generation bootstrapped from no human data, they suffer from one major problem: not all problems that are hard to solve are interesting or informative to improve the overall capabilities of the model. Current asymmetric self-play methods are goal-agnostic with no real grounding. We propose Guided Asymmetric Self-Play (GASP), where grounding is provided by real-data goalpost questions that are identified to pose a hard exploration challenge to the model. During self-play, the teacher first generates an easier variant of a hard question, and then a harder variant of that easier question, with the goal of gradually closing the gap to the goalpost throughout training. Doing so, we improve pass@20 on LiveCodeBench (LCB) by 2.5% over unguided asymmetric self-play, and through the curriculum constructed by the teacher, we manage to solve hard goalpost questions that remain out of reach for all baselines.
Abstract（参考訳）: 非対称な自己プレイは、教師が生徒の学習可能性の端で解決すべき質問を継続的に生成する、大きな言語モデルの訓練後モデルのための有望なパラダイムとして現れてきた。これらの方法は、人間のデータからブートストラップされたオープンエンドなデータ生成を保証しますが、それらは1つの大きな問題に悩まされます。現在の非対称自己再生法はゴールに依存しないが、実際の接地は存在しない。本稿では,GASP(Garded Asymmetric Self-Play)を提案する。自己プレイの間、教師はまず難しい質問のより簡単な変種を生成し、それからその簡単な質問のより難しい変種を生成します。そこで我々は,無誘導非対称な自己プレイよりも2.5%向上したLiveCodeBench(LCB)のpass@20を,教師が構築したカリキュラムを通じて,すべてのベースラインに到達できないハードゴールポスト問題の解決に成功している。

論文の概要: GASP: Guided Asymmetric Self-Play For Coding LLMs

関連論文リスト