Real-World Reinforcement Learning of Active Perception Behaviors
- URL: http://arxiv.org/abs/2512.01188v1
- Date: Mon, 01 Dec 2025 02:05:20 GMT
- Title: Real-World Reinforcement Learning of Active Perception Behaviors
- Authors: Edward S. Hu, Jie Wang, Xingfang Yuan, Fiona Luo, Muyao Li, Gaspard Lambrechts, Oleh Rybkin, Dinesh Jayaraman,
- Abstract summary: A robot's instantaneous sensory observations do not always reveal task-relevant state information.<n>We propose a simple real-world robot learning recipe to efficiently train active perception policies.<n>Our approach, asymmetric advantage weighted regression, exploits access to "privileged" extra sensors at training time.
- Score: 27.56548234738969
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A robot's instantaneous sensory observations do not always reveal task-relevant state information. Under such partial observability, optimal behavior typically involves explicitly acting to gain the missing information. Today's standard robot learning techniques struggle to produce such active perception behaviors. We propose a simple real-world robot learning recipe to efficiently train active perception policies. Our approach, asymmetric advantage weighted regression (AAWR), exploits access to "privileged" extra sensors at training time. The privileged sensors enable training high-quality privileged value functions that aid in estimating the advantage of the target policy. Bootstrapping from a small number of potentially suboptimal demonstrations and an easy-to-obtain coarse policy initialization, AAWR quickly acquires active perception behaviors and boosts task performance. In evaluations on 8 manipulation tasks on 3 robots spanning varying degrees of partial observability, AAWR synthesizes reliable active perception behaviors that outperform all prior approaches. When initialized with a "generalist" robot policy that struggles with active perception tasks, AAWR efficiently generates information-gathering behaviors that allow it to operate under severe partial observability for manipulation tasks. Website: https://penn-pal-lab.github.io/aawr/
Related papers
- Apple: Toward General Active Perception via Reinforcement Learning [17.92494758004686]
APPLE (Active Perception Policy Learning) is a novel framework to address a range of different active perception problems.<n>By design, APPLE is not limited to a specific task and can, in principle, be applied to a wide range of active perception problems.<n> Experiments demonstrate the efficacy of APPLE, achieving high accuracies on both regression and classification tasks.
arXiv Detail & Related papers (2025-05-09T16:49:26Z) - Learning from Active Human Involvement through Proxy Value Propagation [44.144964115275]
Learning from active human involvement enables the human subject to actively intervene and demonstrate to the AI agent during training.<n>We propose a new reward-free active human involvement method called Proxy Value propagation for policy optimization.<n>Our method can learn to solve continuous and discrete control tasks with various human control devices, including the challenging task of driving in Grand Theft Auto V.
arXiv Detail & Related papers (2025-02-05T17:07:37Z) - Offline Imitation Learning Through Graph Search and Retrieval [57.57306578140857]
Imitation learning is a powerful machine learning algorithm for a robot to acquire manipulation skills.
We propose GSR, a simple yet effective algorithm that learns from suboptimal demonstrations through Graph Search and Retrieval.
GSR can achieve a 10% to 30% higher success rate and over 30% higher proficiency compared to baselines.
arXiv Detail & Related papers (2024-07-22T06:12:21Z) - Robotic Control via Embodied Chain-of-Thought Reasoning [86.6680905262442]
Key limitation of learned robot control policies is their inability to generalize outside their training data.<n>Recent works on vision-language-action models (VLAs) have shown that the use of large, internet pre-trained vision-language models can substantially improve their robustness and generalization ability.<n>We introduce Embodied Chain-of-Thought Reasoning (ECoT) for VLAs, in which we train VLAs to perform multiple steps of reasoning about plans, sub-tasks, motions, and visually grounded features before predicting the robot action.
arXiv Detail & Related papers (2024-07-11T17:31:01Z) - What Matters to You? Towards Visual Representation Alignment for Robot
Learning [81.30964736676103]
When operating in service of people, robots need to optimize rewards aligned with end-user preferences.
We propose Representation-Aligned Preference-based Learning (RAPL), a method for solving the visual representation alignment problem.
arXiv Detail & Related papers (2023-10-11T23:04:07Z) - Self-Improving Robots: End-to-End Autonomous Visuomotor Reinforcement
Learning [54.636562516974884]
In imitation and reinforcement learning, the cost of human supervision limits the amount of data that robots can be trained on.
In this work, we propose MEDAL++, a novel design for self-improving robotic systems.
The robot autonomously practices the task by learning to both do and undo the task, simultaneously inferring the reward function from the demonstrations.
arXiv Detail & Related papers (2023-03-02T18:51:38Z) - Training Robots to Evaluate Robots: Example-Based Interactive Reward
Functions for Policy Learning [20.565163553170397]
We propose to train robots to acquire such interactive behaviors automatically.
These evaluations in turn serve as "interactive reward functions" (IRFs)
IRFs can be conveniently trained using only examples of successful outcomes.
arXiv Detail & Related papers (2022-12-17T21:44:03Z) - Accelerating Robotic Reinforcement Learning via Parameterized Action
Primitives [92.0321404272942]
Reinforcement learning can be used to build general-purpose robotic systems.
However, training RL agents to solve robotics tasks still remains challenging.
In this work, we manually specify a library of robot action primitives (RAPS), parameterized with arguments that are learned by an RL policy.
We find that our simple change to the action interface substantially improves both the learning efficiency and task performance.
arXiv Detail & Related papers (2021-10-28T17:59:30Z) - A Framework for Efficient Robotic Manipulation [79.10407063260473]
We show that a single robotic arm can learn sparse-reward manipulation policies from pixels.
We show that, given only 10 demonstrations, a single robotic arm can learn sparse-reward manipulation policies from pixels.
arXiv Detail & Related papers (2020-12-14T22:18:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.