Related papers: GenFlowRL: Shaping Rewards with Generative Object-Centric Flow in Visual Reinforcement Learning

GenFlowRL: Shaping Rewards with Generative Object-Centric Flow in Visual Reinforcement Learning

URL: http://arxiv.org/abs/2508.11049v1
Date: Thu, 14 Aug 2025 20:19:20 GMT
Title: GenFlowRL: Shaping Rewards with Generative Object-Centric Flow in Visual Reinforcement Learning
Authors: Kelin Yu, Sheng Zhang, Harshit Soora, Furong Huang, Heng Huang, Pratap Tokekar, Ruohan Gao,
Abstract summary: We propose GenFlowRL, which derives shaped rewards from generated flow trained from diverse cross-embodiment datasets.<n>Experiments on 10 manipulation tasks, both in simulation and real-world cross-embodiment evaluations, demonstrate that GenFlowRL effectively leverages manipulation features extracted from generated object-centric flow.
Score: 79.68241687396603
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances have shown that video generation models can enhance robot learning by deriving effective robot actions through inverse dynamics. However, these methods heavily depend on the quality of generated data and struggle with fine-grained manipulation due to the lack of environment feedback. While video-based reinforcement learning improves policy robustness, it remains constrained by the uncertainty of video generation and the challenges of collecting large-scale robot datasets for training diffusion models. To address these limitations, we propose GenFlowRL, which derives shaped rewards from generated flow trained from diverse cross-embodiment datasets. This enables learning generalizable and robust policies from diverse demonstrations using low-dimensional, object-centric features. Experiments on 10 manipulation tasks, both in simulation and real-world cross-embodiment evaluations, demonstrate that GenFlowRL effectively leverages manipulation features extracted from generated object-centric flow, consistently achieving superior performance across diverse and challenging scenarios. Our Project Page: https://colinyu1.github.io/genflowrl

Related papers

AnchorDream: Repurposing Video Diffusion for Embodiment-Aware Robot Data Synthesis [33.90053396451562]
We introduce AnchorDream, an embodiment-aware world model that repurposes pretrained video diffusion models for robot data synthesis.<n>Our method scales them into large, diverse, high-quality datasets without requiring explicit environment modeling.<n>Experiments show that the generated data leads to consistent improvements in downstream policy learning, with relative gains of 36.4% in simulator benchmarks and nearly double performance in real-world studies.
arXiv Detail & Related papers (2025-12-12T18:59:45Z)
DynaRend: Learning 3D Dynamics via Masked Future Rendering for Robotic Manipulation [52.136378691610524]
We present DynaRend, a representation learning framework that learns 3D-aware and dynamics-informed triplane features.<n>By pretraining on multi-view RGB-D video data, DynaRend jointly captures spatial geometry, future dynamics, and task semantics in a unified triplane representation.<n>We evaluate DynaRend on two challenging benchmarks, RLBench and Colosseum, demonstrating substantial improvements in policy success rate, generalization to environmental perturbations, and real-world applicability across diverse manipulation tasks.
arXiv Detail & Related papers (2025-10-28T10:17:11Z)
Watch and Learn: Learning to Use Computers from Online Videos [50.10702690339142]
Watch & Learn (W&L) is a framework that converts human demonstration videos readily available on the Internet into executable UI trajectories at scale.<n>We develop an inverse dynamics labeling pipeline with task-aware video retrieval, generate over 53k high-quality trajectories from raw web videos.<n>These results highlight web-scale human demonstration videos as a practical and scalable foundation for advancing CUAs towards real-world deployment.
arXiv Detail & Related papers (2025-10-06T10:29:00Z)
EMMA: Generalizing Real-World Robot Manipulation via Generative Visual Transfer [35.27100635173712]
Vision-language-action (VLA) models increasingly rely on diverse training data to achieve robust generalization.<n>We propose Embodied Manipulation Media Adaptation (EMMA), a VLA policy enhancement framework that integrates a generative data engine with an effective training pipeline.<n>DreamTransfer enables text-controlled visual editing of robot videos, transforming foreground, background, and lighting conditions without compromising 3D structure or geometrical plausibility.<n>AdaMix is a hard-sample-aware training strategy that dynamically reweights training batches to focus optimization on perceptually or kinematically challenging samples.
arXiv Detail & Related papers (2025-09-26T14:34:44Z)
Physical Autoregressive Model for Robotic Manipulation without Action Pretraining [65.8971623698511]
We build upon autoregressive video generation models to propose a Physical Autoregressive Model (PAR)<n>PAR leverages the world knowledge embedded in video pretraining to understand physical dynamics without requiring action pretraining.<n>Experiments on the ManiSkill benchmark show that PAR achieves a 100% success rate on the PushCube task.
arXiv Detail & Related papers (2025-08-13T13:54:51Z)
AMPLIFY: Actionless Motion Priors for Robot Learning from Videos [29.799207502031496]
We introduce AMPLIFY, a novel framework that leverages large-scale video data.<n>We train a forward dynamics model on abundant action-free videos and an inverse dynamics model on a limited set of action-labeled examples.<n>In downstream policy learning, our dynamics predictions enable a 1.2-2.2x improvement in low-data regimes, a 1.4x average improvement by learning from action-free human videos, and the first generalization to LIBERO tasks from zero in-distribution action data.
arXiv Detail & Related papers (2025-06-17T05:31:42Z)
DreamGen: Unlocking Generalization in Robot Learning through Video World Models [120.25799361925387]
DreamGen is a pipeline for training robot policies that generalize across behaviors and environments through neural trajectories.<n>Our work establishes a promising new axis for scaling robot learning well beyond manual data collection.
arXiv Detail & Related papers (2025-05-19T04:55:39Z)
VITAL: Interactive Few-Shot Imitation Learning via Visual Human-in-the-Loop Corrections [10.49712834719005]
Imitation Learning (IL) has emerged as a powerful approach in robotics, allowing robots to acquire new skills by mimicking human actions.<n>Despite its potential, the data collection process for IL remains a significant challenge due to logistical difficulties and high costs associated with obtaining high-quality demonstrations.<n>We propose a large-scale data generation from a handful of demonstrations through data augmentation in simulation.
arXiv Detail & Related papers (2024-07-30T23:29:47Z)
Diffusion-Based Neural Network Weights Generation [80.89706112736353]
D2NWG is a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning. Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation. Our approach is scalable to large architectures such as large language models (LLMs), overcoming the limitations of current parameter generation techniques.
arXiv Detail & Related papers (2024-02-28T08:34:23Z)
Any-point Trajectory Modeling for Policy Learning [64.23861308947852]
We introduce Any-point Trajectory Modeling (ATM) to predict future trajectories of arbitrary points within a video frame. ATM outperforms strong video pre-training baselines by 80% on average. We show effective transfer learning of manipulation skills from human videos and videos from a different robot morphology.
arXiv Detail & Related papers (2023-12-28T23:34:43Z)
CLOUD: Contrastive Learning of Unsupervised Dynamics [19.091886595825947]
We propose to learn forward and inverse dynamics in a fully unsupervised manner via contrastive estimation. We demonstrate the efficacy of our approach across a variety of tasks including goal-directed planning and imitation from observations.
arXiv Detail & Related papers (2020-10-23T15:42:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.