Fugu-MT 論文翻訳(概要): VLM-Guided Experience Replay

論文の概要: VLM-Guided Experience Replay

arxiv url: http://arxiv.org/abs/2602.01915v1
Date: Mon, 02 Feb 2026 10:19:59 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-03 19:28:34.072665
Title: VLM-Guided Experience Replay
Title（参考訳）: VLM-Guided Experience Replay
Authors: Elad Sharony, Tom Jurgenson, Orr Krupnik, Dotan Di Castro, Shie Mannor,
Abstract要約: 本稿では、視覚言語モデル(VLM)を用いて、リプレイバッファにおける体験の優先順位付けを導くことを提案する。私たちのキーとなるアイデアは、凍結した訓練済みのVLMを自動評価器として使用して、エージェントの経験から有望なサブトラジェクトリを特定し、優先順位付けすることです。ゲームプレイやロボティクスを含む全シナリオにおいて,提案手法を用いて訓練したエージェントは,平均成功率11～52%,サンプル効率19～45%向上した。
参考スコア（独自算出の注目度）: 41.08659748023147
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advances in Large Language Models (LLMs) and Vision-Language Models (VLMs) have enabled powerful semantic and multimodal reasoning capabilities, creating new opportunities to enhance sample efficiency, high-level planning, and interpretability in reinforcement learning (RL). While prior work has integrated LLMs and VLMs into various components of RL, the replay buffer, a core component for storing and reusing experiences, remains unexplored. We propose addressing this gap by leveraging VLMs to guide the prioritization of experiences in the replay buffer. Our key idea is to use a frozen, pre-trained VLM (requiring no fine-tuning) as an automated evaluator to identify and prioritize promising sub-trajectories from the agent's experiences. Across scenarios, including game-playing and robotics, spanning both discrete and continuous domains, agents trained with our proposed prioritization method achieve 11-52% higher average success rates and improve sample efficiency by 19-45% compared to previous approaches. https://esharony.me/projects/vlm-rb/
Abstract（参考訳）: 近年のLLM(Large Language Models)とVLM(Vision-Language Models)の進歩により、強力なセマンティックおよびマルチモーダル推論機能が実現され、サンプル効率の向上、高レベルプランニング、強化学習(RL)における解釈可能性などの新たな機会が生まれている。以前の作業ではLLMとVLMをRLのさまざまなコンポーネントに統合していたが、リプレイバッファはエクスペリエンスの保存と再利用のコアコンポーネントであり、まだ探索されていない。本稿では、VLMを活用してリプレイバッファにおける体験の優先順位付けを導出することで、このギャップに対処することを提案する。私たちのキーとなるアイデアは、凍結した訓練済みのVLM(微調整不要)を自動評価器として使用して、エージェントの経験から有望なサブトラジェクトリを識別し、優先順位付けすることです。ゲームプレイングやロボティクスなど,個別ドメインと連続ドメインの両方にまたがるさまざまなシナリオにおいて,提案手法を用いて訓練したエージェントは,従来の手法に比べて平均成功率11～52%,サンプル効率19～45%向上した。 https://esharony.me/projects/vlm-rb/

論文の概要: VLM-Guided Experience Replay

関連論文リスト