Fugu-MT 論文翻訳(概要): AR$^2$: Adversarial Reinforcement Learning for Abstract Reasoning in Large Language Models

論文の概要: AR$^2$: Adversarial Reinforcement Learning for Abstract Reasoning in Large Language Models

arxiv url: http://arxiv.org/abs/2509.03537v1
Date: Wed, 27 Aug 2025 17:26:44 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-05 20:21:09.908674
Title: AR$^2$: Adversarial Reinforcement Learning for Abstract Reasoning in Large Language Models
Title（参考訳）: AR$^2$:大規模言語モデルにおける抽象推論のための逆強化学習
Authors: Cheng-Kai Yeh, Hsing-Wang Lee, Chung-Hung Kuo, Hen-Hsen Huang,
Abstract要約: 本稿では,大規模言語モデル (LLM) の抽象化能力を高めるために設計された新しいフレームワークである AR$2$ (Adversarial Reinforcement Learning for Abstract Reasoning) を提案する。 AR$2$は、基本ロジックを変更することなく、カーネル問題を物語に富んだ、挑戦的な記述に変換するために教師モデルを採用している。学生符号化モデルは、基礎となる計算カーネルを抽出することにより、これらの複雑な物語問題を解決するために訓練される。
参考スコア（独自算出の注目度）: 12.484537674896908
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Abstraction--the ability to recognize and distill essential computational patterns from complex problem statements--is a foundational skill in computer science, critical both for human problem-solvers and coding-oriented large language models (LLMs). Despite recent advances in training LLMs for code generation using reinforcement learning (RL), most existing approaches focus primarily on superficial pattern recognition, overlooking explicit training for abstraction. In this study, we propose AR$^2$ (Adversarial Reinforcement Learning for Abstract Reasoning), a novel framework explicitly designed to enhance the abstraction abilities of LLMs. AR$^2$ employs a teacher model to transform kernel problems into narrative-rich, challenging descriptions without changing their fundamental logic. Simultaneously, a student coding model is trained to solve these complex narrative problems by extracting their underlying computational kernels. Experimental results demonstrate that AR$^2$ substantially improves the student model's accuracy on previously unseen, challenging programming tasks, underscoring abstraction as a key skill for enhancing LLM generalization.
Abstract（参考訳）: 抽象化 - 複雑な問題文から必須の計算パターンを認識して精錬する能力 - コンピュータ科学の基本技術であり、人間の問題解決とコーディング指向の大規模言語モデル(LLM)の両方にとって重要なものである。強化学習(RL)を用いたコード生成のためのLLMのトレーニングは近年進歩しているが、既存のアプローチのほとんどは表面パターン認識に重点を置いており、抽象のための明示的なトレーニングを見越している。本研究では,AL$^2$ (Adversarial Reinforcement Learning for Abstract Reasoning)を提案する。 AR$^2$は、基本ロジックを変更することなく、カーネル問題を物語に富んだ、挑戦的な記述に変換するための教師モデルを採用している。同時に、学生のプログラミングモデルは、基礎となる計算カーネルを抽出することで、これらの複雑な物語問題を解決するために訓練される。実験結果から,従来のプログラミングタスクにおいて,AR$^2$は学生モデルの精度を大幅に向上させることを示す。

論文の概要: AR$^2$: Adversarial Reinforcement Learning for Abstract Reasoning in Large Language Models

関連論文リスト