Fugu-MT 論文翻訳(概要): GEM: A Gym for Agentic LLMs

論文の概要: GEM: A Gym for Agentic LLMs

arxiv url: http://arxiv.org/abs/2510.01051v1
Date: Wed, 01 Oct 2025 15:55:57 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-03 16:59:20.65689
Title: GEM: A Gym for Agentic LLMs
Title（参考訳）: GEM:エージェントLDMのためのジム
Authors: Zichen Liu, Anya Sims, Keyu Duan, Changyu Chen, Simon Yu, Xiangxin Zhou, Haotian Xu, Shaopan Xiong, Bo Liu, Chenmien Tan, Chuen Yang Beh, Weixun Wang, Hao Zhu, Weiyan Shi, Diyi Yang, Michael Shieh, Yee Whye Teh, Wee Sun Lee, Min Lin,
Abstract要約: General Experience Maker (GEM) は、大規模言語モデル(LLM)の時代に設計されたオープンソースの環境シミュレータである。 GEMは、高スループットのための非同期ベクトル化実行を含む環境エージェントインタフェースの標準化されたフレームワークを提供する。 GEMを用いてPPO,GRPO,REINFORCEのアップル・ツー・アップル・ベンチマークを行い,アルゴリズム設計に光を当てる。
参考スコア（独自算出の注目度）: 88.36970707762424
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The training paradigm for large language models (LLMs) is moving from static datasets to experience-based learning, where agents acquire skills via interacting with complex environments. To facilitate this transition we introduce GEM (General Experience Maker), an open-source environment simulator designed for the age of LLMs. Analogous to OpenAI-Gym for traditional reinforcement learning (RL), GEM provides a standardized framework for the environment-agent interface, including asynchronous vectorized execution for high throughput, and flexible wrappers for easy extensibility. GEM also features a diverse suite of environments, robust integrated tools, and single-file example scripts demonstrating using GEM with five popular RL training frameworks. Along with this, we also provide a set of baselines across 24 environments using REINFORCE with Return Batch Normalization (ReBN), which -- unlike GRPO -- is compatible with the full RL setting of dense per-turn rewards and offers better credit assignment. We further conduct apple-to-apple benchmarking of PPO, GRPO and REINFORCE in both single- and multi-turn settings using GEM to shed light on the algorithmic designs. Lastly, GEM also functions as a convenient evaluation toolkit besides a training environment. We hope this framework can help accelerate future agentic LLM research.
Abstract（参考訳）: 大規模言語モデル(LLM)のトレーニングパラダイムは、静的データセットから、複雑な環境とのインタラクションを通じてエージェントがスキルを取得するエクスペリエンスベースの学習へと移行している。 GEM(General Experience Maker)は,LLM時代のオープンソース環境シミュレータである。従来の強化学習(RL)のためのOpenAI-Gymに類似して、GEMは、高スループットの非同期ベクトル化実行や、拡張性の容易なフレキシブルラッパーを含む、環境エージェントインターフェースのための標準化されたフレームワークを提供する。 GEMには、さまざまな環境スイート、堅牢な統合ツール、および5つの人気のあるRLトレーニングフレームワークでGEMを使用することを実証するシングルファイル例スクリプトも備えている。 GRPOとは異なり、ターン当たりの高密度な報酬の完全なRL設定と互換性があり、より優れたクレジット割り当てを提供する。我々はさらに,GEMを用いてPPO,GRPO,REINFORCEのアップル・ツー・アップル・ベンチマークを行い,アルゴリズム設計に光を当てる。最後に、GEMはトレーニング環境以外の便利な評価ツールキットとしても機能する。このフレームワークが将来のエージェントLDM研究の加速に役立つことを願っている。

論文の概要: GEM: A Gym for Agentic LLMs

関連論文リスト