Fugu-MT 論文翻訳(概要): TimeRewarder: Learning Dense Reward from Passive Videos via Frame-wise Temporal Distance

論文の概要: TimeRewarder: Learning Dense Reward from Passive Videos via Frame-wise Temporal Distance

arxiv url: http://arxiv.org/abs/2509.26627v1
Date: Tue, 30 Sep 2025 17:58:20 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-01 17:09:04.656386
Title: TimeRewarder: Learning Dense Reward from Passive Videos via Frame-wise Temporal Distance
Title（参考訳）: TimeRewarder:フレームワイド時間距離による受動的ビデオからのディエンス・リワード学習
Authors: Yuyang Liu, Chuan Wen, Yihang Hu, Dinesh Jayaraman, Yang Gao,
Abstract要約: TimeRewarderは、受動的ビデオから進捗推定信号を導出する、シンプルで効果的な報酬学習手法である。 TimeRewarderはスパース・リワードタスクのRLを大幅に改善し、タスク1タスク当たり20,000のインタラクションしか持たない9/10タスクでほぼ完璧に成功することを示す。
参考スコア（独自算出の注目度）: 36.22149703563646
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Designing dense rewards is crucial for reinforcement learning (RL), yet in robotics it often demands extensive manual effort and lacks scalability. One promising solution is to view task progress as a dense reward signal, as it quantifies the degree to which actions advance the system toward task completion over time. We present TimeRewarder, a simple yet effective reward learning method that derives progress estimation signals from passive videos, including robot demonstrations and human videos, by modeling temporal distances between frame pairs. We then demonstrate how TimeRewarder can supply step-wise proxy rewards to guide reinforcement learning. In our comprehensive experiments on ten challenging Meta-World tasks, we show that TimeRewarder dramatically improves RL for sparse-reward tasks, achieving nearly perfect success in 9/10 tasks with only 200,000 interactions per task with the environment. This approach outperformed previous methods and even the manually designed environment dense reward on both the final success rate and sample efficiency. Moreover, we show that TimeRewarder pretraining can exploit real-world human videos, highlighting its potential as a scalable approach path to rich reward signals from diverse video sources.
Abstract（参考訳）: 密度の高い報酬を設計することは強化学習(RL)にとって重要であるが、ロボット工学では手作業による広範囲な作業を必要とし、スケーラビリティに欠けることが多い。 1つの有望な解決策は、タスクの進行がタスク完了に向けてシステムを進める度合いを時間とともに定量化するため、タスクの進行を高密度な報酬信号として見ることである。本稿では,ロボットデモや人間ビデオを含む受動的ビデオから,フレームペア間の時間的距離をモデル化し,進捗推定信号を導出する,シンプルで効果的な報酬学習手法であるTimeRewarderを提案する。次に、強化学習をガイドするためにTimeRewarderがステップワイドなプロキシ報酬を提供する方法を紹介します。課題10つのMeta-Worldタスクに関する総合的な実験において、TimeRewarderはスパース・リワードタスクのRLを劇的に改善し、タスク1タスク当たり20,000のインタラクションしか持たない9/10タスクでほぼ完璧に成功することを示した。このアプローチは従来の手法よりも優れており、最終的な成功率とサンプル効率の両面で、手作業で設計した環境でも高い報酬を得られる。さらに,TimeRewarderの事前学習は実世界の人間ビデオを利用することが可能であり,多様なビデオソースからの報奨信号に対するスケーラブルなアプローチパスとしての可能性を強調した。

論文の概要: TimeRewarder: Learning Dense Reward from Passive Videos via Frame-wise Temporal Distance

関連論文リスト