TimeRewarder: Learning Dense Reward from Passive Videos via Frame-wise Temporal Distance
- URL: http://arxiv.org/abs/2509.26627v1
- Date: Tue, 30 Sep 2025 17:58:20 GMT
- Title: TimeRewarder: Learning Dense Reward from Passive Videos via Frame-wise Temporal Distance
- Authors: Yuyang Liu, Chuan Wen, Yihang Hu, Dinesh Jayaraman, Yang Gao,
- Abstract summary: TimeRewarder is a simple yet effective reward learning method that derives progress estimation signals from passive videos.<n>We show that TimeRewarder dramatically improves RL for sparse-reward tasks, achieving nearly perfect success in 9/10 tasks with only 200,000 interactions per task with the environment.
- Score: 36.22149703563646
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Designing dense rewards is crucial for reinforcement learning (RL), yet in robotics it often demands extensive manual effort and lacks scalability. One promising solution is to view task progress as a dense reward signal, as it quantifies the degree to which actions advance the system toward task completion over time. We present TimeRewarder, a simple yet effective reward learning method that derives progress estimation signals from passive videos, including robot demonstrations and human videos, by modeling temporal distances between frame pairs. We then demonstrate how TimeRewarder can supply step-wise proxy rewards to guide reinforcement learning. In our comprehensive experiments on ten challenging Meta-World tasks, we show that TimeRewarder dramatically improves RL for sparse-reward tasks, achieving nearly perfect success in 9/10 tasks with only 200,000 interactions per task with the environment. This approach outperformed previous methods and even the manually designed environment dense reward on both the final success rate and sample efficiency. Moreover, we show that TimeRewarder pretraining can exploit real-world human videos, highlighting its potential as a scalable approach path to rich reward signals from diverse video sources.
Related papers
- On-Robot Reinforcement Learning with Goal-Contrastive Rewards [24.415607337006968]
Reinforcement Learning (RL) has the potential to enable robots to learn from their own actions in the real world.<n>We propose GCR (Goal-Contrastive Rewards), a dense reward function learning method that can be trained on passive video demonstrations.<n>GCR combines two loss functions, an implicit value loss function that models how the reward increases when traversing a successful trajectory, and a goal-contrastive loss that discriminates between successful and failed trajectories.
arXiv Detail & Related papers (2024-10-25T22:11:54Z) - Affordance-Guided Reinforcement Learning via Visual Prompting [51.361977466993345]
Keypoint-based Affordance Guidance for Improvements (KAGI) is a method leveraging rewards shaped by vision-language models (VLMs) for autonomous RL.<n>On diverse real-world manipulation tasks specified by natural language descriptions, KAGI improves the sample efficiency of autonomous RL and enables successful task completion in 30K online fine-tuning steps.
arXiv Detail & Related papers (2024-07-14T21:41:29Z) - DrS: Learning Reusable Dense Rewards for Multi-Stage Tasks [26.730889757506915]
We propose DrS (Dense reward learning from Stages), a novel approach for learning reusable dense rewards for multi-stage tasks.
By leveraging the stage structures of the task, DrS learns a high-quality dense reward from sparse rewards and demonstrations if given.
Experiments on three physical robot manipulation task families with 1000+ task variants demonstrate that our learned rewards can be reused in unseen tasks.
arXiv Detail & Related papers (2024-04-25T17:28:33Z) - DreamSmooth: Improving Model-based Reinforcement Learning via Reward
Smoothing [60.21269454707625]
DreamSmooth learns to predict a temporally-smoothed reward, instead of the exact reward at the given timestep.
We show that DreamSmooth achieves state-of-the-art performance on long-horizon sparse-reward tasks.
arXiv Detail & Related papers (2023-11-02T17:57:38Z) - Go Beyond Imagination: Maximizing Episodic Reachability with World
Models [68.91647544080097]
In this paper, we introduce a new intrinsic reward design called GoBI - Go Beyond Imagination.
We apply learned world models to generate predicted future states with random actions.
Our method greatly outperforms previous state-of-the-art methods on 12 of the most challenging Minigrid navigation tasks.
arXiv Detail & Related papers (2023-08-25T20:30:20Z) - Reward Relabelling for combined Reinforcement and Imitation Learning on
sparse-reward tasks [2.0305676256390934]
We present a new method to leverage demonstrations and episodes collected online in any sparse-reward environment with any off-policy algorithm.
Our method is based on a reward bonus given to demonstrations and successful episodes, encouraging expert imitation and self-imitation.
Our experiments focus on manipulation robotics, specifically on three tasks for a 6 degrees-of-freedom robotic arm in simulation.
arXiv Detail & Related papers (2022-01-11T08:35:18Z) - Bottom-Up Skill Discovery from Unsegmented Demonstrations for
Long-Horizon Robot Manipulation [55.31301153979621]
We tackle real-world long-horizon robot manipulation tasks through skill discovery.
We present a bottom-up approach to learning a library of reusable skills from unsegmented demonstrations.
Our method has shown superior performance over state-of-the-art imitation learning methods in multi-stage manipulation tasks.
arXiv Detail & Related papers (2021-09-28T16:18:54Z) - A Study on Dense and Sparse (Visual) Rewards in Robot Policy Learning [19.67628391301068]
We study the performance of multiple state-of-the-art deep reinforcement learning algorithms under different types of reward.
Our results show that visual dense rewards are more successful than visual sparse rewards and that there is no single best algorithm for all tasks.
arXiv Detail & Related papers (2021-08-06T17:47:48Z) - Semi-supervised reward learning for offline reinforcement learning [71.6909757718301]
Training agents usually requires reward functions, but rewards are seldom available in practice and their engineering is challenging and laborious.
We propose semi-supervised learning algorithms that learn from limited annotations and incorporate unlabelled data.
In our experiments with a simulated robotic arm, we greatly improve upon behavioural cloning and closely approach the performance achieved with ground truth rewards.
arXiv Detail & Related papers (2020-12-12T20:06:15Z) - Self-Imitation Learning for Robot Tasks with Sparse and Delayed Rewards [1.2691047660244335]
We propose a practical self-imitation learning method named Self-Imitation Learning with Constant Reward (SILCR)
Our method assigns the immediate rewards at each timestep with constant values according to their final episodic rewards.
We demonstrate the effectiveness of our method in some challenging continuous robotics control tasks in MuJoCo simulation.
arXiv Detail & Related papers (2020-10-14T11:12:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.