Fugu-MT 論文翻訳(概要): SOLAR-RL: Semi-Online Long-horizon Assignment Reinforcement Learning

論文の概要: SOLAR-RL: Semi-Online Long-horizon Assignment Reinforcement Learning

arxiv url: http://arxiv.org/abs/2604.22558v1
Date: Fri, 24 Apr 2026 13:53:39 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-27 15:36:26.483669
Title: SOLAR-RL: Semi-Online Long-horizon Assignment Reinforcement Learning
Title（参考訳）: SOLAR-RL:半オンライン長軸配置強化学習
Authors: Jichao Wang, Liuyang Bian, Yufeng Zhou, Han Xiao, Yue Pan, Guozhi Wang, Hao Wang, Zhaoxiong Wang, Yafei Wen, Xiaoxin Chen, Shuai Ren, Lingfang Zeng,
Abstract要約: Reinforcement Learning (RL)は、動的GUIタスク上でMLLMエージェントをトレーニングするための有望なパラダイムとして登場した。オンラインRLは長期のダイナミクスを捉えているが、高い相互作用コストと潜在的な環境不安定さに悩まされている。 SOLAR-RL(Semi-Online Long-Horizon Assignment Reinforcement Learning)を提案する。
参考スコア（独自算出の注目度）: 21.3755929369092
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: As Multimodal Large Language Models (MLLMs) mature, GUI agents are evolving from static interactions to complex navigation. While Reinforcement Learning (RL) has emerged as a promising paradigm for training MLLM agents on dynamic GUI tasks, its effective application faces a dilemma. Standard Offline RL often relies on static step-level data, neglecting global trajectory semantics such as task completion and execution quality. Conversely, Online RL captures the long-term dynamics but suffers from high interaction costs and potential environmental instability. To bridge this gap, we propose SOLAR-RL (Semi-Online Long-horizon Assignment Reinforcement Learning). Instead of relying solely on expensive online interactions, our framework integrates global trajectory insights directly into the offline learning process. Specifically, we reconstruct diverse rollout candidates from static data, detect the first failure point using per-step validity signals, and retroactively assign dense step-level rewards with target-aligned shaping to reflect trajectory-level execution quality, effectively simulating online feedback without interaction costs. Extensive experiments demonstrate that SOLAR-RL significantly improves long-horizon task completion rates and robustness compared to strong baselines, offering a sample-efficient solution for autonomous GUI navigation.
Abstract（参考訳）: MLLM(Multimodal Large Language Models)が成熟するにつれて、GUIエージェントは静的相互作用から複雑なナビゲーションへと進化している。 Reinforcement Learning(RL)は、動的GUIタスク上でMLLMエージェントをトレーニングするための有望なパラダイムとして登場したが、その効果的なアプリケーションはジレンマに直面している。標準オフラインRLは、しばしば静的なステップレベルのデータに依存し、タスクの完了や実行品質のようなグローバルな軌跡のセマンティクスを無視します。逆に、オンラインRLは長期的な力学を捉えているが、高い相互作用コストと潜在的な環境不安定さに悩まされている。このギャップを埋めるため,SOLAR-RL(Semi-Online Long-Horizon Assignment Reinforcement Learning)を提案する。我々のフレームワークは、高価なオンラインインタラクションのみに頼るのではなく、オフライン学習プロセスに直接グローバルな軌跡の洞察を統合する。具体的には,静的データから多種多様なロールアウト候補を再構成し,ステップ毎の妥当性信号を用いて第1故障点を検出し,軌道レベルの実行品質を反映するターゲット整形による高密度ステップレベルの報酬を遡及的に割り当て,相互作用コストを伴わずにオンラインフィードバックを効果的にシミュレーションする。大規模な実験により、SOLAR-RLは強力なベースラインに比べて長時間のタスク完了率と堅牢性を著しく改善し、自律的なGUIナビゲーションのためのサンプル効率の高いソリューションを提供することが示された。

論文の概要: SOLAR-RL: Semi-Online Long-horizon Assignment Reinforcement Learning

関連論文リスト