Fugu-MT 論文翻訳(概要): A Survey of Reinforcement Learning for Large Language Models under Data Scarcity: Challenges and Solutions

論文の概要: A Survey of Reinforcement Learning for Large Language Models under Data Scarcity: Challenges and Solutions

arxiv url: http://arxiv.org/abs/2604.17312v1
Date: Sun, 19 Apr 2026 08:01:31 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-21 21:52:52.45743
Title: A Survey of Reinforcement Learning for Large Language Models under Data Scarcity: Challenges and Solutions
Title（参考訳）: データスカシティによる大規模言語モデルの強化学習に関する調査:課題と解決策
Authors: Zhiyin Yu, Yuchen Mou, Juncheng Yan, Junyu Luo, Chunchun Chen, Xing Wei, Yunhui Liu, Hongru Sun, Yuxing Zhang, Jun Xu, Yatao Bian, Ming Zhang, Wei Ye, Tieke He, Jie Yang, Guanjie Zheng, Zhonghai Wu, Bo Zhang, Lei Bai, Xiao Luo,
Abstract要約: 強化学習(RL)は、大規模言語モデル(LLM)の推論能力を高めるための強力なポストトレーニングパラダイムとして登場した。 RLは、高品質な外部監視の可用性の制限や、モデル生成エクスペリエンスの制限されたボリュームなど、データ不足の大きな課題に直面している。データ中心の視点、トレーニング中心の視点、フレームワーク中心の視点という3つの相補的な視点に基づいて構築されたボトムアップ階層的なフレームワークを提案する。
参考スコア（独自算出の注目度）: 60.897488753340674
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reinforcement learning (RL) has emerged as a powerful post-training paradigm for enhancing the reasoning capabilities of large language models (LLMs). However, reinforcement learning for LLMs faces substantial data scarcity challenges, including the limited availability of high-quality external supervision and the constrained volume of model-generated experience. These limitations make data-efficient reinforcement learning a critical research direction. In this survey, we present the first systematic review of reinforcement learning for LLMs under data scarcity. We propose a bottom-up hierarchical framework built around three complementary perspectives: the data-centric perspective, the training-centric perspective, and the framework-centric perspective. We develop a taxonomy of existing methods, summarize representative approaches in each category, and analyze their strengths and limitations. Our taxonomy aims to provide a clear conceptual foundation for understanding the design space of data-efficient RL for LLMs and to guide researchers working in this emerging area. We hope this survey offers a comprehensive roadmap for future research and inspires new directions toward more efficient and scalable reinforcement learning post-training for LLMs.
Abstract（参考訳）: 強化学習(Reinforcement Learning, RL)は、大規模言語モデル(LLM)の推論能力を高めるための強力なポストトレーニングパラダイムとして登場した。しかし、LLMの強化学習は、高品質な外部監視の可用性の制限や、モデル生成経験の制限など、データ不足の重大な課題に直面している。これらの制限により、データ効率の強化学習が重要な研究方向となる。本研究では,データ不足下でのLLMの強化学習について,初めて体系的に検討する。データ中心の視点、トレーニング中心の視点、フレームワーク中心の視点という3つの相補的な視点に基づいて構築されたボトムアップ階層的なフレームワークを提案する。我々は,既存の手法の分類法を開発し,各カテゴリの代表的アプローチを要約し,その強みと限界を分析する。我々の分類学は、LLMにおけるデータ効率のよいRLの設計空間を理解するための明確な概念基盤を提供することと、この新興分野で働く研究者を指導することを目的としている。この調査は今後の研究の総合的なロードマップを提供し、LLMのより効率的でスケーラブルな強化学習に向けた新たな方向性を促すことを願っている。

論文の概要: A Survey of Reinforcement Learning for Large Language Models under Data Scarcity: Challenges and Solutions

関連論文リスト