Related papers: ReLAM: Learning Anticipation Model for Rewarding Visual Robotic Manipulation

ReLAM: Learning Anticipation Model for Rewarding Visual Robotic Manipulation

URL: http://arxiv.org/abs/2509.22402v1
Date: Fri, 26 Sep 2025 14:28:42 GMT
Title: ReLAM: Learning Anticipation Model for Rewarding Visual Robotic Manipulation
Authors: Nan Tang, Jing-Cheng Pang, Guanlin Li, Chao Qian, Yang Yu,
Abstract summary: Reward design remains a critical bottleneck in visual reinforcement learning for robotic manipulation.<n>In this study, we propose a method that implicitly infers spatial distances through keypoints extracted from images.<n>We introduce Reward Learning with Anticipation Model (ReLAM), a novel framework that automatically generates dense, structured rewards from action-free video demonstrations.
Score: 25.115056940401164
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Reward design remains a critical bottleneck in visual reinforcement learning (RL) for robotic manipulation. In simulated environments, rewards are conventionally designed based on the distance to a target position. However, such precise positional information is often unavailable in real-world visual settings due to sensory and perceptual limitations. In this study, we propose a method that implicitly infers spatial distances through keypoints extracted from images. Building on this, we introduce Reward Learning with Anticipation Model (ReLAM), a novel framework that automatically generates dense, structured rewards from action-free video demonstrations. ReLAM first learns an anticipation model that serves as a planner and proposes intermediate keypoint-based subgoals on the optimal path to the final goal, creating a structured learning curriculum directly aligned with the task's geometric objectives. Based on the anticipated subgoals, a continuous reward signal is provided to train a low-level, goal-conditioned policy under the hierarchical reinforcement learning (HRL) framework with provable sub-optimality bound. Extensive experiments on complex, long-horizon manipulation tasks show that ReLAM significantly accelerates learning and achieves superior performance compared to state-of-the-art methods.

Related papers

Self-Correcting VLA: Online Action Refinement via Sparse World Imagination [55.982504915794514]
We propose Self-Correcting VLA (SC-VLA), which achieve self-improvement by intrinsically guiding action refinement through sparse imagination.<n>SC-VLA achieve state-of-the-art performance, yielding the highest task throughput with 16% fewer steps and a 9% higher success rate than the best-performing baselines.
arXiv Detail & Related papers (2026-02-25T06:58:06Z)
ReWorld: Multi-Dimensional Reward Modeling for Embodied World Models [27.729654985554372]
ReWorld is a framework aimed to employ reinforcement learning to align the video-based embodied world models with physical realism, task completion capability, embodiment plausibility and visual quality.<n>We show that ReWorld significantly improves the physical fidelity, logical coherence, embodiment and visual quality of generated rollouts, outperforming previous methods.
arXiv Detail & Related papers (2026-01-18T14:27:10Z)
Reinforcement Learning with Inverse Rewards for World Model Post-training [29.19830208692156]
We propose Reinforcement Learning with Inverse Rewards to improve action-following in video world models.<n>RLIR derives verifiable reward signals by recovering input actions from generated videos using an Inverse Dynamics Model.
arXiv Detail & Related papers (2025-09-28T16:27:47Z)
Continual Visual Reinforcement Learning with A Life-Long World Model [55.05017177980985]
We present a new continual learning approach for visual dynamics modeling.<n>We first introduce the life-long world model, which learns task-specific latent dynamics.<n>Then, we address the value estimation challenge for previous tasks with the exploratory-conservative behavior learning approach.
arXiv Detail & Related papers (2023-03-12T05:08:03Z)
Discrete Factorial Representations as an Abstraction for Goal Conditioned Reinforcement Learning [99.38163119531745]
We show that applying a discretizing bottleneck can improve performance in goal-conditioned RL setups. We experimentally prove the expected return on out-of-distribution goals, while still allowing for specifying goals with expressive structure.
arXiv Detail & Related papers (2022-11-01T03:31:43Z)
Simplifying Model-based RL: Learning Representations, Latent-space Models, and Policies with One Objective [142.36200080384145]
We propose a single objective which jointly optimize a latent-space model and policy to achieve high returns while remaining self-consistent. We demonstrate that the resulting algorithm matches or improves the sample-efficiency of the best prior model-based and model-free RL methods.
arXiv Detail & Related papers (2022-09-18T03:51:58Z)
Texture-guided Saliency Distilling for Unsupervised Salient Object Detection [67.10779270290305]
We propose a novel USOD method to mine rich and accurate saliency knowledge from both easy and hard samples. Our method achieves state-of-the-art USOD performance on RGB, RGB-D, RGB-T, and video SOD benchmarks.
arXiv Detail & Related papers (2022-07-13T02:01:07Z)
Imaginary Hindsight Experience Replay: Curious Model-based Learning for Sparse Reward Tasks [9.078290260836706]
We propose a model-based method tailored for sparse-reward tasks that foregoes the need for complicated reward engineering. This approach, termed Imaginary Hindsight Experience Replay, minimises real-world interactions by incorporating imaginary data into policy updates. Upon evaluation, this approach provides an order of magnitude increase in data-efficiency on average versus the state-of-the-art model-free method in the benchmark OpenAI Gym Fetch Robotics tasks.
arXiv Detail & Related papers (2021-10-05T23:38:31Z)
Model-Based Reinforcement Learning via Latent-Space Collocation [110.04005442935828]
We argue that it is easier to solve long-horizon tasks by planning sequences of states rather than just actions. We adapt the idea of collocation, which has shown good results on long-horizon tasks in optimal control literature, to the image-based setting by utilizing learned latent state space models.
arXiv Detail & Related papers (2021-06-24T17:59:18Z)
Learning Long-term Visual Dynamics with Region Proposal Interaction Networks [75.06423516419862]
We build object representations that can capture inter-object and object-environment interactions over a long-range. Thanks to the simple yet effective object representation, our approach outperforms prior methods by a significant margin.
arXiv Detail & Related papers (2020-08-05T17:48:00Z)
Hindsight Expectation Maximization for Goal-conditioned Reinforcement Learning [26.631740480100724]
We propose a graphical model framework for goal-conditioned RL, with an EM algorithm that operates on the lower bound of the RL objective. The E-step provides a natural interpretation of how 'learning in hindsight' techniques, such as HER, to handle extremely sparse goal-conditioned rewards. The M-step reduces policy optimization to supervised learning updates, which greatly stabilizes end-to-end training on high-dimensional inputs such as images.
arXiv Detail & Related papers (2020-06-13T03:25:31Z)
Learning View and Target Invariant Visual Servoing for Navigation [9.873635079670093]
We learn viewpoint invariant and target invariant visual servoing for local mobile robot navigation. We train deep convolutional network controller to reach the desired goal.
arXiv Detail & Related papers (2020-03-04T20:36:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.