Belief-Grounded Networks for Accelerated Robot Learning under Partial
Observability
- URL: http://arxiv.org/abs/2010.09170v5
- Date: Thu, 21 Oct 2021 00:37:00 GMT
- Title: Belief-Grounded Networks for Accelerated Robot Learning under Partial
Observability
- Authors: Hai Nguyen, Brett Daley, Xinchao Song, Christopher Amato, Robert Platt
- Abstract summary: We propose a method for policy learning under partial observability called the Belief-Grounded Network (BGN)
BGN incentivizes a neural network to concisely summarize its input history.
It outperforms all other tested methods and its learned policies work well when transferred onto a physical robot.
- Score: 13.080765595494213
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many important robotics problems are partially observable in the sense that a
single visual or force-feedback measurement is insufficient to reconstruct the
state. Standard approaches involve learning a policy over beliefs or
observation-action histories. However, both of these have drawbacks; it is
expensive to track the belief online, and it is hard to learn policies directly
over histories. We propose a method for policy learning under partial
observability called the Belief-Grounded Network (BGN) in which an auxiliary
belief-reconstruction loss incentivizes a neural network to concisely summarize
its input history. Since the resulting policy is a function of the history
rather than the belief, it can be executed easily at runtime. We compare BGN
against several baselines on classic benchmark tasks as well as three novel
robotic touch-sensing tasks. BGN outperforms all other tested methods and its
learned policies work well when transferred onto a physical robot.
Related papers
- BPP: Long-Context Robot Imitation Learning by Focusing on Key History Frames [27.70479413079641]
Best-performing robot policies typically condition only on the current observation, limiting their applicability to such tasks.<n>We analyze why policies latch onto spurious correlations and find that this problem stems from limited coverage over the space of possible histories during training.<n>Motivated by these findings, we propose Big Picture Policies (BPP), an approach that conditions on a minimal set of meaningfuls detected by a vision-language model.
arXiv Detail & Related papers (2026-02-16T18:49:56Z) - Real-World Reinforcement Learning of Active Perception Behaviors [27.56548234738969]
A robot's instantaneous sensory observations do not always reveal task-relevant state information.<n>We propose a simple real-world robot learning recipe to efficiently train active perception policies.<n>Our approach, asymmetric advantage weighted regression, exploits access to "privileged" extra sensors at training time.
arXiv Detail & Related papers (2025-12-01T02:05:20Z) - Exploiting Policy Idling for Dexterous Manipulation [19.909895138745345]
We investigate how to leverage the detectability of idling behavior to inform exploration and policy improvement.<n>Our approach, Pause-Induced Perturbations (PIP), applies perturbations at detected idling states.<n>On a range of challenging simulated dual-arm tasks, we find that this simple approach can already noticeably improve test-time performance.
arXiv Detail & Related papers (2025-08-21T15:52:45Z) - Success in Humanoid Reinforcement Learning under Partial Observation [4.473337652382325]
This research presents the first successful instance of learning under partial observability in a humanoid locomotion environment.<n>The learned policy performance achieves comparable to state-of-the-art results with full state access.<n>The key to this success is a novel history encoder that processes a fixed-length sequence of past observations in parallel.
arXiv Detail & Related papers (2025-07-25T01:51:12Z) - FACTR: Force-Attending Curriculum Training for Contact-Rich Policy Learning [70.65987250853311]
force feedback is readily available in most robot arms, but not commonly used in teleoperation and policy learning.
We present a low-cost, intuitive, bilateral teleoperation setup that relays external forces of the follower arm back to the teacher arm.
We then introduce FACTR, a policy learning method that employs a curriculum which corrupts the visual input with decreasing intensity throughout training.
arXiv Detail & Related papers (2025-02-24T18:59:07Z) - Offline Imitation Learning Through Graph Search and Retrieval [57.57306578140857]
Imitation learning is a powerful machine learning algorithm for a robot to acquire manipulation skills.
We propose GSR, a simple yet effective algorithm that learns from suboptimal demonstrations through Graph Search and Retrieval.
GSR can achieve a 10% to 30% higher success rate and over 30% higher proficiency compared to baselines.
arXiv Detail & Related papers (2024-07-22T06:12:21Z) - Robotic Control via Embodied Chain-of-Thought Reasoning [86.6680905262442]
Key limitation of learned robot control policies is their inability to generalize outside their training data.
Recent works on vision-language-action models (VLAs) have shown that the use of large, internet pre-trained vision-language models can substantially improve their robustness and generalization ability.
We introduce Embodied Chain-of-Thought Reasoning (ECoT) for VLAs, in which we train VLAs to perform multiple steps of reasoning about plans, sub-tasks, motions, and visually grounded features before predicting the robot action.
arXiv Detail & Related papers (2024-07-11T17:31:01Z) - Contact Energy Based Hindsight Experience Prioritization [19.42106651692228]
Multi-goal robot manipulation tasks with sparse rewards are difficult for reinforcement learning (RL) algorithms.
Recent algorithms such as Hindsight Experience Replay (HER) expedite learning by taking advantage of failed trajectories.
We propose a novel approach Contact Energy Based Prioritization(CEBP) to select the samples from the replay buffer based on rich information due to contact.
arXiv Detail & Related papers (2023-12-05T11:32:25Z) - Learning Vision-based Pursuit-Evasion Robot Policies [54.52536214251999]
We develop a fully-observable robot policy that generates supervision for a partially-observable one.
We deploy our policy on a physical quadruped robot with an RGB-D camera on pursuit-evasion interactions in the wild.
arXiv Detail & Related papers (2023-08-30T17:59:05Z) - Bridging the Sim-to-Real Gap from the Information Bottleneck Perspective [38.845882541261645]
We propose a novel privileged knowledge distillation method called the Historical Information Bottleneck (HIB)
HIB learns a privileged knowledge representation from historical trajectories by capturing the underlying changeable dynamic information.
Empirical experiments on both simulated and real-world tasks demonstrate that HIB yields improved generalizability compared to previous methods.
arXiv Detail & Related papers (2023-05-29T07:51:00Z) - Self-Improving Robots: End-to-End Autonomous Visuomotor Reinforcement
Learning [54.636562516974884]
In imitation and reinforcement learning, the cost of human supervision limits the amount of data that robots can be trained on.
In this work, we propose MEDAL++, a novel design for self-improving robotic systems.
The robot autonomously practices the task by learning to both do and undo the task, simultaneously inferring the reward function from the demonstrations.
arXiv Detail & Related papers (2023-03-02T18:51:38Z) - Leveraging Fully Observable Policies for Learning under Partial
Observability [14.918197552051929]
We propose a method for partially observable reinforcement learning that uses a fully observable policy during offline training to improve online performance.
Our approach can leverage the fully-observable policy for exploration and parts of the domain that are fully observable while still being able to learn under partial observability.
A successful policy transfer to a physical robot in a manipulation task from pixels shows our approach's practicality in learning interesting policies under partial observability.
arXiv Detail & Related papers (2022-11-03T16:57:45Z) - Verifying Learning-Based Robotic Navigation Systems [61.01217374879221]
We show how modern verification engines can be used for effective model selection.
Specifically, we use verification to detect and rule out policies that may demonstrate suboptimal behavior.
Our work is the first to demonstrate the use of verification backends for recognizing suboptimal DRL policies in real-world robots.
arXiv Detail & Related papers (2022-05-26T17:56:43Z) - Explainability in reinforcement learning: perspective and position [1.299941371793082]
This paper attempts to give a systematic overview of existing methods in the explainable RL area.
It proposes a novel unified taxonomy, building and expanding on the existing ones.
arXiv Detail & Related papers (2022-03-22T09:00:13Z) - Toward Force Estimation in Robot-Assisted Surgery using Deep Learning
with Vision and Robot State [25.121899443298567]
Vision-based deep learning using convolutional neural networks is a promising approach for providing useful force estimates.
We present a force estimation neural network that uses RGB images and robot state as inputs.
It showed comparable accuracy but faster computation times than a baseline recurrent neural network, making it better suited for real-time applications.
arXiv Detail & Related papers (2020-11-04T04:00:07Z) - COG: Connecting New Skills to Past Experience with Offline Reinforcement
Learning [78.13740204156858]
We show that we can reuse prior data to extend new skills simply through dynamic programming.
We demonstrate the effectiveness of our approach by chaining together several behaviors seen in prior datasets for solving a new task.
We train our policies in an end-to-end fashion, mapping high-dimensional image observations to low-level robot control commands.
arXiv Detail & Related papers (2020-10-27T17:57:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.