MSVIPER: Improved Policy Distillation for Reinforcement-Learning-Based
Robot Navigation
- URL: http://arxiv.org/abs/2209.09079v1
- Date: Mon, 19 Sep 2022 15:12:53 GMT
- Title: MSVIPER: Improved Policy Distillation for Reinforcement-Learning-Based
Robot Navigation
- Authors: Aaron M. Roth, Jing Liang, Ram Sriram, Elham Tabassi, and Dinesh
Manocha
- Abstract summary: We present Multiple Scenario Verifiable Reinforcement Learning via Policy Extraction (MSVIPER)
MSVIPER learns an "expert" policy using any Reinforcement Learning (RL) technique involving learning a state-action mapping.
We demonstrate that MSVIPER results in efficient decision trees and can accurately mimic the behavior of the expert policy.
- Score: 46.32001721656828
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We present Multiple Scenario Verifiable Reinforcement Learning via Policy
Extraction (MSVIPER), a new method for policy distillation to decision trees
for improved robot navigation. MSVIPER learns an "expert" policy using any
Reinforcement Learning (RL) technique involving learning a state-action mapping
and then uses imitation learning to learn a decision-tree policy from it. We
demonstrate that MSVIPER results in efficient decision trees and can accurately
mimic the behavior of the expert policy. Moreover, we present efficient policy
distillation and tree-modification techniques that take advantage of the
decision tree structure to allow improvements to a policy without retraining.
We use our approach to improve the performance of RL-based robot navigation
algorithms for indoor and outdoor scenes. We demonstrate the benefits in terms
of reduced freezing and oscillation behaviors (by up to 95\% reduction) for
mobile robots navigating among dynamic obstacles and reduced vibrations and
oscillation (by up to 17\%) for outdoor robot navigation on complex, uneven
terrains.
Related papers
- Research on Autonomous Robots Navigation based on Reinforcement Learning [13.559881645869632]
We use the Deep Q Network (DQN) and Proximal Policy Optimization (PPO) models to optimize the path planning and decision-making process.
We have verified the effectiveness and robustness of these models in various complex scenarios.
arXiv Detail & Related papers (2024-07-02T00:44:06Z) - Deep Reinforcement Learning with Enhanced PPO for Safe Mobile Robot Navigation [0.6554326244334868]
This study investigates the application of deep reinforcement learning to train a mobile robot for autonomous navigation in a complex environment.
The robot utilizes LiDAR sensor data and a deep neural network to generate control signals guiding it toward a specified target while avoiding obstacles.
arXiv Detail & Related papers (2024-05-25T15:08:36Z) - Distilling Reinforcement Learning Policies for Interpretable Robot Locomotion: Gradient Boosting Machines and Symbolic Regression [53.33734159983431]
This paper introduces a novel approach to distill neural RL policies into more interpretable forms.
We train expert neural network policies using RL and distill them into (i) GBMs, (ii) EBMs, and (iii) symbolic policies.
arXiv Detail & Related papers (2024-03-21T11:54:45Z) - Robot path planning using deep reinforcement learning [0.0]
Reinforcement learning methods offer an alternative to map-free navigation tasks.
Deep reinforcement learning agents are implemented for both the obstacle avoidance and the goal-oriented navigation task.
An analysis of the changes in the behaviour and performance of the agents caused by modifications in the reward function is conducted.
arXiv Detail & Related papers (2023-02-17T20:08:59Z) - Constrained Reinforcement Learning for Robotics via Scenario-Based
Programming [64.07167316957533]
It is crucial to optimize the performance of DRL-based agents while providing guarantees about their behavior.
This paper presents a novel technique for incorporating domain-expert knowledge into a constrained DRL training loop.
Our experiments demonstrate that using our approach to leverage expert knowledge dramatically improves the safety and the performance of the agent.
arXiv Detail & Related papers (2022-06-20T07:19:38Z) - Human-Aware Robot Navigation via Reinforcement Learning with Hindsight
Experience Replay and Curriculum Learning [28.045441768064215]
Reinforcement learning approaches have shown superior ability in solving sequential decision making problems.
In this work, we consider the task of training an RL agent without employing the demonstration data.
We propose to incorporate the hindsight experience replay (HER) and curriculum learning (CL) techniques with RL to efficiently learn the optimal navigation policy in the dense crowd.
arXiv Detail & Related papers (2021-10-09T13:18:11Z) - XAI-N: Sensor-based Robot Navigation using Expert Policies and Decision
Trees [55.9643422180256]
We present a novel sensor-based learning navigation algorithm to compute a collision-free trajectory for a robot in dense and dynamic environments.
Our approach uses deep reinforcement learning-based expert policy that is trained using a sim2real paradigm.
We highlight the benefits of our algorithm in simulated environments and navigating a Clearpath Jackal robot among moving pedestrians.
arXiv Detail & Related papers (2021-04-22T01:33:10Z) - Rapidly Adaptable Legged Robots via Evolutionary Meta-Learning [65.88200578485316]
We present a new meta-learning method that allows robots to quickly adapt to changes in dynamics.
Our method significantly improves adaptation to changes in dynamics in high noise settings.
We validate our approach on a quadruped robot that learns to walk while subject to changes in dynamics.
arXiv Detail & Related papers (2020-03-02T22:56:27Z) - Enhanced Adversarial Strategically-Timed Attacks against Deep
Reinforcement Learning [91.13113161754022]
We introduce timing-based adversarial strategies against a DRL-based navigation system by jamming in physical noise patterns on the selected time frames.
Our experimental results show that the adversarial timing attacks can lead to a significant performance drop.
arXiv Detail & Related papers (2020-02-20T21:39:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.