Related papers: MSVIPER: Improved Policy Distillation for Reinforcement-Learning-Based Robot Navigation

MSVIPER: Improved Policy Distillation for Reinforcement-Learning-Based Robot Navigation

URL: http://arxiv.org/abs/2209.09079v1
Date: Mon, 19 Sep 2022 15:12:53 GMT
Title: MSVIPER: Improved Policy Distillation for Reinforcement-Learning-Based Robot Navigation
Authors: Aaron M. Roth, Jing Liang, Ram Sriram, Elham Tabassi, and Dinesh Manocha
Abstract summary: We present Multiple Scenario Verifiable Reinforcement Learning via Policy Extraction (MSVIPER) MSVIPER learns an "expert" policy using any Reinforcement Learning (RL) technique involving learning a state-action mapping. We demonstrate that MSVIPER results in efficient decision trees and can accurately mimic the behavior of the expert policy.
Score: 46.32001721656828
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: We present Multiple Scenario Verifiable Reinforcement Learning via Policy Extraction (MSVIPER), a new method for policy distillation to decision trees for improved robot navigation. MSVIPER learns an "expert" policy using any Reinforcement Learning (RL) technique involving learning a state-action mapping and then uses imitation learning to learn a decision-tree policy from it. We demonstrate that MSVIPER results in efficient decision trees and can accurately mimic the behavior of the expert policy. Moreover, we present efficient policy distillation and tree-modification techniques that take advantage of the decision tree structure to allow improvements to a policy without retraining. We use our approach to improve the performance of RL-based robot navigation algorithms for indoor and outdoor scenes. We demonstrate the benefits in terms of reduced freezing and oscillation behaviors (by up to 95\% reduction) for mobile robots navigating among dynamic obstacles and reduced vibrations and oscillation (by up to 17\%) for outdoor robot navigation on complex, uneven terrains.

Related papers

Research on Autonomous Robots Navigation based on Reinforcement Learning [13.559881645869632]
We use the Deep Q Network (DQN) and Proximal Policy Optimization (PPO) models to optimize the path planning and decision-making process. We have verified the effectiveness and robustness of these models in various complex scenarios.
arXiv Detail & Related papers (2024-07-02T00:44:06Z)
Deep Reinforcement Learning with Enhanced PPO for Safe Mobile Robot Navigation [0.6554326244334868]
This study investigates the application of deep reinforcement learning to train a mobile robot for autonomous navigation in a complex environment. The robot utilizes LiDAR sensor data and a deep neural network to generate control signals guiding it toward a specified target while avoiding obstacles.
arXiv Detail & Related papers (2024-05-25T15:08:36Z)
Distilling Reinforcement Learning Policies for Interpretable Robot Locomotion: Gradient Boosting Machines and Symbolic Regression [53.33734159983431]
This paper introduces a novel approach to distill neural RL policies into more interpretable forms. We train expert neural network policies using RL and distill them into (i) GBMs, (ii) EBMs, and (iii) symbolic policies.
arXiv Detail & Related papers (2024-03-21T11:54:45Z)
Confidence-Controlled Exploration: Efficient Sparse-Reward Policy Learning for Robot Navigation [72.24964965882783]
Reinforcement learning (RL) is a promising approach for robotic navigation, allowing robots to learn through trial and error. Real-world robotic tasks often suffer from sparse rewards, leading to inefficient exploration and suboptimal policies. We introduce Confidence-Controlled Exploration (CCE), a novel method that improves sample efficiency in RL-based robotic navigation without modifying the reward function.
arXiv Detail & Related papers (2023-06-09T18:45:15Z)
Robot path planning using deep reinforcement learning [0.0]
Reinforcement learning methods offer an alternative to map-free navigation tasks. Deep reinforcement learning agents are implemented for both the obstacle avoidance and the goal-oriented navigation task. An analysis of the changes in the behaviour and performance of the agents caused by modifications in the reward function is conducted.
arXiv Detail & Related papers (2023-02-17T20:08:59Z)
Constrained Reinforcement Learning for Robotics via Scenario-Based Programming [64.07167316957533]
It is crucial to optimize the performance of DRL-based agents while providing guarantees about their behavior. This paper presents a novel technique for incorporating domain-expert knowledge into a constrained DRL training loop. Our experiments demonstrate that using our approach to leverage expert knowledge dramatically improves the safety and the performance of the agent.
arXiv Detail & Related papers (2022-06-20T07:19:38Z)
Human-Aware Robot Navigation via Reinforcement Learning with Hindsight Experience Replay and Curriculum Learning [28.045441768064215]
Reinforcement learning approaches have shown superior ability in solving sequential decision making problems. In this work, we consider the task of training an RL agent without employing the demonstration data. We propose to incorporate the hindsight experience replay (HER) and curriculum learning (CL) techniques with RL to efficiently learn the optimal navigation policy in the dense crowd.
arXiv Detail & Related papers (2021-10-09T13:18:11Z)
XAI-N: Sensor-based Robot Navigation using Expert Policies and Decision Trees [55.9643422180256]
We present a novel sensor-based learning navigation algorithm to compute a collision-free trajectory for a robot in dense and dynamic environments. Our approach uses deep reinforcement learning-based expert policy that is trained using a sim2real paradigm. We highlight the benefits of our algorithm in simulated environments and navigating a Clearpath Jackal robot among moving pedestrians.
arXiv Detail & Related papers (2021-04-22T01:33:10Z)
Rapidly Adaptable Legged Robots via Evolutionary Meta-Learning [65.88200578485316]
We present a new meta-learning method that allows robots to quickly adapt to changes in dynamics. Our method significantly improves adaptation to changes in dynamics in high noise settings. We validate our approach on a quadruped robot that learns to walk while subject to changes in dynamics.
arXiv Detail & Related papers (2020-03-02T22:56:27Z)
Enhanced Adversarial Strategically-Timed Attacks against Deep Reinforcement Learning [91.13113161754022]
We introduce timing-based adversarial strategies against a DRL-based navigation system by jamming in physical noise patterns on the selected time frames. Our experimental results show that the adversarial timing attacks can lead to a significant performance drop.
arXiv Detail & Related papers (2020-02-20T21:39:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.