Fugu-MT 論文翻訳(概要): E$^2$DT: Efficient and Effective Decision Transformer with Experience-Aware Sampling for Robotic Manipulation

論文の概要: E$^2$DT: Efficient and Effective Decision Transformer with Experience-Aware Sampling for Robotic Manipulation

arxiv url: http://arxiv.org/abs/2605.00159v1
Date: Thu, 30 Apr 2026 19:28:44 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-04 17:43:28.724219
Title: E$^2$DT: Efficient and Effective Decision Transformer with Experience-Aware Sampling for Robotic Manipulation
Title（参考訳）: E$^2$DT:ロボットマニピュレーションのための経験型サンプリングによる効率的かつ効果的な決定変換器
Authors: Kaiyan Zhao, Borong Zhang, Yiming Wang, Xingyu Liu, Xuetao Li, Yuyang Chen, Xiaoguang Niu,
Abstract要約: Decision Transformer (DT) は、長期タスクに対処するための効果的なフレームワークとして登場した。 E$2$DTはDT誘導k-Determinantal Point Processサンプリングフレームワークである。私たちのフレームワークはエクスペリエンスを意識しており、E$2$DTの両方を効率的にできます。
参考スコア（独自算出の注目度）: 12.326967455610536
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In reinforcement learning (RL) for robotic manipulation, the Decision Transformer (DT) has emerged as an effective framework for addressing long-horizon tasks. However, DT's performance depends heavily on the coverage of collected experiences. Without an active exploration mechanism, standard DT relies on uniform replay, which leads to poor sample efficiency, limited exploration, and reduced overall effectiveness. At the same time, while excessive exploration can help avoid local optima, it often delays policy convergence and leads to degraded efficiency. To address these limitations, we propose E$^2$DT, a DT-guided k-Determinantal Point Process sampling framework that enables the model to actively shape its own experience selection. Our framework is experience-aware, allowing E$^2$DT to be both efficient, by prioritizing sampling quality, such as high-return, high-uncertainty, and underrepresented trajectories, and effective, by ensuring diversity across trajectory windows to preserve policy optimality. Specifically, DT's internal latent embeddings measure diversity across trajectory windows, while quality is quantified through a composite metric that integrates return-to-go (RTG) quantiles, predictive uncertainty, and stage coverage based on inverse frequency. These two dimensions are integrated into a novel quality-diversity joint kernel that prioritizes the most informative experiences, thereby enabling learning that is both efficient and effective. We evaluate E$^2$DT on challenging robotic manipulation benchmarks in both simulation and real-robot settings. Results show that it consistently outperforms prior methods. These findings demonstrate that coupling policy learning with experience-aware sampling provides a principled path toward robust long-horizon robotic learning.
Abstract（参考訳）: ロボット操作のための強化学習(RL)において、長距離タスクに対処するための効果的な枠組みとして、Decision Transformer(DT)が登場した。しかし、DTのパフォーマンスは収集された経験のカバレッジに大きく依存します。アクティブな探索機構がなければ、標準DTは均一なリプレイに依存し、サンプル効率の低下、探索の制限、全体的な効率の低下につながる。同時に、過度の探索は局所的な最適化を避けるのに役立つが、しばしば政策の収束を遅らせ、効率を低下させる。これらの制約に対処するため、DT誘導k-Determinantal Point ProcessサンプリングフレームワークであるE$^2$DTを提案する。 E$^2$DTは,高リターン,高不確かさ,低トラジェクタなどのサンプリング品質を優先し,最適性を維持するためにトラジェクタウィンドウ間の多様性を確保することで有効である。具体的には、DTの内部潜伏埋め込みはトラジェクトリウィンドウ間の多様性を計測し、品質はRTG(Return-to-go)量子化、予測の不確実性、および逆周波数に基づくステージカバレッジを統合した合成計量によって定量化される。これらの2次元は、最も情報性の高い体験を優先し、効率的かつ効果的な学習を可能にする、新しい品質多様性ジョイントカーネルに統合される。 E$^2$DTをシミュレーションと実ロボット設定の両方でロボット操作ベンチマークに挑戦する上で評価した。結果は、従来手法よりも一貫して優れていたことを示している。これらの結果から,経験を意識したサンプリングによるポリシ学習が,堅牢な長期ロボット学習への道筋となることが示唆された。

関連論文リスト

Self-Imitated Diffusion Policy for Efficient and Robust Visual Navigation [7.341858898582114]
SIDP(Self-Imitated Diffusion Policy)は、自己からサンプリングされた一連の軌跡を選択的に模倣することによって、計画の改善を学ぶ新しいフレームワークである。具体的には、SIDPは報酬誘導型自己刺激機構を導入し、政策が常に高品質な軌道を効率的に生成することを奨励する。
論文参考訳（メタデータ） (2026-01-30T13:27:59Z)
Human-in-the-loop Online Rejection Sampling for Robotic Manipulation [55.99788088622936]
Hi-ORSは、オンライン微調整中に負の報酬を得たサンプルをフィルタリングすることで、値推定を安定化する。 Hi-ORSは、わずか1.5時間でコンタクトリッチな操作をマスターするためのpiベースのポリシーを微調整する。
論文参考訳（メタデータ） (2025-10-30T11:53:08Z)
Large Language Model-Empowered Decision Transformer for UAV-Enabled Data Collection [71.84636717632206]
空間分散デバイスからの信頼性とエネルギー効率のよいデータ収集のための無人航空機(UAV)は、IoT(Internet of Things)アプリケーションをサポートする上で大きな可能性を秘めている。有効なUAV制御ポリシーを学習するための共同言語モデル(LLM)を提案する。 LLM-CRDTは、現在の最先端DTアプローチよりも最大36.7%高いエネルギー効率を達成し、オンラインおよびオフラインメソッドのベンチマークより優れている。
論文参考訳（メタデータ） (2025-09-17T13:05:08Z)
Is Diversity All You Need for Scalable Robotic Manipulation? [50.747150672933316]
ロボット学習におけるデータ多様性の役割について,従来の「より多様な方がよい」という直観に固執する3つの重要な次元(タスク),実施形態(ロボットの使用方法),専門家(専門家)を用いて検討する。タスクの多様性は、タスクごとのデモンストレーション量よりも重要であり、多様な事前学習タスクから新しい下流シナリオへの移行に有効であることを示す。本稿では,速度のあいまいさを緩和する分散デバイアス法を提案する。GO-1-Proは,2.5倍の事前学習データを用いて,15%の性能向上を実現している。
論文参考訳（メタデータ） (2025-07-08T17:52:44Z)
ACT: Empowering Decision Transformer with Dynamic Programming via Advantage Conditioning [19.02836010747026]
決定変換器(DT)は、所望の将来のリターンで条件付けられたアクションを生成する。我々はDTの弱点を克服するために動的プログラミングでDTを強化することを提案する。本手法は, 環境条件によらず, 効果的な軌道縫合とロバストな動作生成を実証する。
論文参考訳（メタデータ） (2023-09-12T02:05:43Z)
Task-specific experimental design for treatment effect estimation [59.879567967089145]
因果推論の標準は大規模ランダム化試験(RCT)である。近年の研究では、RCTのよりサンプル効率の良い代替案が提案されているが、これらは因果効果を求める下流の応用には適用できない。実験的な設計のためのタスク固有のアプローチを開発し、特定の下流アプリケーションにカスタマイズされたサンプリング戦略を導出する。
論文参考訳（メタデータ） (2023-06-08T18:10:37Z)
A Transferable and Automatic Tuning of Deep Reinforcement Learning for Cost Effective Phishing Detection [21.481974148873807]
現実の課題の多くは、複数の補完的な学習モデルのアンサンブルを配置する必要がある。 Deep Reinforcement Learning (DRL) はコスト効率のよい代替手段であり、検出器は前者の出力に基づいて動的に選択される。
論文参考訳（メタデータ） (2022-09-19T14:09:07Z)
Diversity-based Trajectory and Goal Selection with Hindsight Experience Replay [8.259694128526112]
我々はHER(DTGSH)を用いた多様性に基づく軌道と目標選択を提案する。提案手法は,全てのタスクにおいて,他の最先端手法よりも高速に学習し,高い性能を達成することができることを示す。
論文参考訳（メタデータ） (2021-08-17T21:34:24Z)
Towards Accurate Knowledge Transfer via Target-awareness Representation Disentanglement [56.40587594647692]
本稿では,TRED(Target-Awareness Representation Disentanglement)の概念を取り入れた新しいトランスファー学習アルゴリズムを提案する。 TREDは、対象のタスクに関する関連する知識を元のソースモデルから切り離し、ターゲットモデルを微調整する際、レギュレータとして使用する。各種実世界のデータセットを用いた実験により,本手法は標準微調整を平均2%以上安定的に改善することが示された。
論文参考訳（メタデータ） (2020-10-16T17:45:08Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。