On-Robot Bayesian Reinforcement Learning for POMDPs
- URL: http://arxiv.org/abs/2307.11954v1
- Date: Sat, 22 Jul 2023 01:16:29 GMT
- Title: On-Robot Bayesian Reinforcement Learning for POMDPs
- Authors: Hai Nguyen, Sammie Katt, Yuchen Xiao, Christopher Amato
- Abstract summary: This paper advances Bayesian reinforcement learning for robotics by proposing a specialized framework for physical systems.
We capture this knowledge in a factored representation, then demonstrate the posterior factorizes in a similar shape, and ultimately formalize the model in a Bayesian framework.
We then introduce a sample-based online solution method, based on Monte-Carlo tree search and particle filtering, specialized to solve the resulting model.
- Score: 16.667924736270415
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Robot learning is often difficult due to the expense of gathering data. The
need for large amounts of data can, and should, be tackled with effective
algorithms and leveraging expert information on robot dynamics. Bayesian
reinforcement learning (BRL), thanks to its sample efficiency and ability to
exploit prior knowledge, is uniquely positioned as such a solution method.
Unfortunately, the application of BRL has been limited due to the difficulties
of representing expert knowledge as well as solving the subsequent inference
problem. This paper advances BRL for robotics by proposing a specialized
framework for physical systems. In particular, we capture this knowledge in a
factored representation, then demonstrate the posterior factorizes in a similar
shape, and ultimately formalize the model in a Bayesian framework. We then
introduce a sample-based online solution method, based on Monte-Carlo tree
search and particle filtering, specialized to solve the resulting model. This
approach can, for example, utilize typical low-level robot simulators and
handle uncertainty over unknown dynamics of the environment. We empirically
demonstrate its efficiency by performing on-robot learning in two human-robot
interaction tasks with uncertainty about human behavior, achieving near-optimal
performance after only a handful of real-world episodes. A video of learned
policies is at https://youtu.be/H9xp60ngOes.
Related papers
- Offline Imitation Learning Through Graph Search and Retrieval [57.57306578140857]
Imitation learning is a powerful machine learning algorithm for a robot to acquire manipulation skills.
We propose GSR, a simple yet effective algorithm that learns from suboptimal demonstrations through Graph Search and Retrieval.
GSR can achieve a 10% to 30% higher success rate and over 30% higher proficiency compared to baselines.
arXiv Detail & Related papers (2024-07-22T06:12:21Z) - Coupling Machine Learning with Ontology for Robotics Applications [0.0]
The lack of availability of prior knowledge in dynamic scenarios is without doubt a major barrier for scalable machine intelligence.
My view of the interaction between the two tiers intelligence is based on the idea that when knowledge is not readily available at the knowledge base tier, more knowledge can be extracted from the other tier.
arXiv Detail & Related papers (2024-06-08T23:38:03Z) - Active Exploration in Bayesian Model-based Reinforcement Learning for Robot Manipulation [8.940998315746684]
We propose a model-based reinforcement learning (RL) approach for robotic arm end-tasks.
We employ Bayesian neural network models to represent, in a probabilistic way, both the belief and information encoded in the dynamic model during exploration.
Our experiments show the advantages of our Bayesian model-based RL approach, with similar quality in the results than relevant alternatives.
arXiv Detail & Related papers (2024-04-02T11:44:37Z) - SERL: A Software Suite for Sample-Efficient Robotic Reinforcement
Learning [85.21378553454672]
We develop a library containing a sample efficient off-policy deep RL method, together with methods for computing rewards and resetting the environment.
We find that our implementation can achieve very efficient learning, acquiring policies for PCB board assembly, cable routing, and object relocation.
These policies achieve perfect or near-perfect success rates, extreme robustness even under perturbations, and exhibit emergent robustness recovery and correction behaviors.
arXiv Detail & Related papers (2024-01-29T10:01:10Z) - Robot Fine-Tuning Made Easy: Pre-Training Rewards and Policies for
Autonomous Real-World Reinforcement Learning [58.3994826169858]
We introduce RoboFuME, a reset-free fine-tuning system for robotic reinforcement learning.
Our insights are to utilize offline reinforcement learning techniques to ensure efficient online fine-tuning of a pre-trained policy.
Our method can incorporate data from an existing robot dataset and improve on a target task within as little as 3 hours of autonomous real-world experience.
arXiv Detail & Related papers (2023-10-23T17:50:08Z) - Quality-Diversity Optimisation on a Physical Robot Through
Dynamics-Aware and Reset-Free Learning [4.260312058817663]
We build upon the Reset-Free QD (RF-QD) algorithm to learn controllers directly on a physical robot.
This method uses a dynamics model, learned from interactions between the robot and the environment, to predict the robot's behaviour.
RF-QD also includes a recovery policy that returns the robot to a safe zone when it has walked outside of it, allowing continuous learning.
arXiv Detail & Related papers (2023-04-24T13:24:00Z) - Hindsight States: Blending Sim and Real Task Elements for Efficient
Reinforcement Learning [61.3506230781327]
In robotics, one approach to generate training data builds on simulations based on dynamics models derived from first principles.
Here, we leverage the imbalance in complexity of the dynamics to learn more sample-efficiently.
We validate our method on several challenging simulated tasks and demonstrate that it improves learning both alone and when combined with an existing hindsight algorithm.
arXiv Detail & Related papers (2023-03-03T21:55:04Z) - Development of a robust cascaded architecture for intelligent robot
grasping using limited labelled data [0.0]
In the case of robots, we can not afford to spend that much time on making it to learn how to grasp objects effectively.
We propose an efficient learning architecture based on VQVAE so that robots can be taught with sufficient data corresponding to correct grasping.
A semi-supervised learning based model which has much more generalization capability even with limited labelled data set has been investigated.
arXiv Detail & Related papers (2021-11-06T11:01:15Z) - Learning of Parameters in Behavior Trees for Movement Skills [0.9562145896371784]
Behavior Trees (BTs) can provide a policy representation that supports modular and composable skills.
We present a novel algorithm that can learn the parameters of a BT policy in simulation and then generalize to the physical robot without any additional training.
arXiv Detail & Related papers (2021-09-27T13:46:39Z) - Skill Preferences: Learning to Extract and Execute Robotic Skills from
Human Feedback [82.96694147237113]
We present Skill Preferences, an algorithm that learns a model over human preferences and uses it to extract human-aligned skills from offline data.
We show that SkiP enables a simulated kitchen robot to solve complex multi-step manipulation tasks.
arXiv Detail & Related papers (2021-08-11T18:04:08Z) - A Framework for Efficient Robotic Manipulation [79.10407063260473]
We show that a single robotic arm can learn sparse-reward manipulation policies from pixels.
We show that, given only 10 demonstrations, a single robotic arm can learn sparse-reward manipulation policies from pixels.
arXiv Detail & Related papers (2020-12-14T22:18:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.