Robust Deep Reinforcement Learning for Quadcopter Control
- URL: http://arxiv.org/abs/2111.03915v1
- Date: Sat, 6 Nov 2021 16:35:13 GMT
- Title: Robust Deep Reinforcement Learning for Quadcopter Control
- Authors: Aditya M. Deshpande, Ali A. Minai, Manish Kumar
- Abstract summary: In this work, we use Robust Markov Decision Processes (RMDP) to train the drone control policy.
It opts for pessimistic optimization to handle potential gaps between policy transfer from one environment to another.
The trained control policy is tested on the task of quadcopter positional control.
- Score: 0.8687092759073857
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep reinforcement learning (RL) has made it possible to solve complex
robotics problems using neural networks as function approximators. However, the
policies trained on stationary environments suffer in terms of generalization
when transferred from one environment to another. In this work, we use Robust
Markov Decision Processes (RMDP) to train the drone control policy, which
combines ideas from Robust Control and RL. It opts for pessimistic optimization
to handle potential gaps between policy transfer from one environment to
another. The trained control policy is tested on the task of quadcopter
positional control. RL agents were trained in a MuJoCo simulator. During
testing, different environment parameters (unseen during the training) were
used to validate the robustness of the trained policy for transfer from one
environment to another. The robust policy outperformed the standard agents in
these environments, suggesting that the added robustness increases generality
and can adapt to non-stationary environments.
Codes: https://github.com/adipandas/gym_multirotor
Related papers
- Survival of the Fittest: Evolutionary Adaptation of Policies for Environmental Shifts [0.15889427269227555]
We develop an adaptive re-training algorithm inspired by evolutionary game theory (EGT)
ERPO shows faster policy adaptation, higher average rewards, and reduced computational costs in policy adaptation.
arXiv Detail & Related papers (2024-10-22T09:29:53Z) - Task and Domain Adaptive Reinforcement Learning for Robot Control [0.34137115855910755]
We present a novel adaptive agent to dynamically adapt policy in response to different tasks and environmental conditions.
The agent is trained using a custom, highly parallelized simulator built on IsaacGym.
We perform zero-shot transfer to fly the blimp in the real world to solve various tasks.
arXiv Detail & Related papers (2024-04-29T14:02:02Z) - Robot Fleet Learning via Policy Merging [58.5086287737653]
We propose FLEET-MERGE to efficiently merge policies in the fleet setting.
We show that FLEET-MERGE consolidates the behavior of policies trained on 50 tasks in the Meta-World environment.
We introduce a novel robotic tool-use benchmark, FLEET-TOOLS, for fleet policy learning in compositional and contact-rich robot manipulation tasks.
arXiv Detail & Related papers (2023-10-02T17:23:51Z) - Dichotomy of Control: Separating What You Can Control from What You
Cannot [129.62135987416164]
We propose a future-conditioned supervised learning framework that separates mechanisms within a policy's control (actions) from those beyond a policy's control (environmentity)
We show that DoC yields policies that are consistent with their conditioning inputs, ensuring that conditioning a learned policy on a desired high-return future outcome will correctly induce high-return behavior.
arXiv Detail & Related papers (2022-10-24T17:49:56Z) - Teaching a Robot to Walk Using Reinforcement Learning [0.0]
reinforcement learning can train optimal walking policies with ease.
We teach a simulated two-dimensional bipedal robot how to walk using the OpenAI Gym BipedalWalker-v3 environment.
ARS resulted in a better trained robot, and produced an optimal policy which officially "solves" the BipedalWalker-v3 problem.
arXiv Detail & Related papers (2021-12-13T21:35:45Z) - Policy Search for Model Predictive Control with Application to Agile
Drone Flight [56.24908013905407]
We propose a policy-search-for-model-predictive-control framework for MPC.
Specifically, we formulate the MPC as a parameterized controller, where the hard-to-optimize decision variables are represented as high-level policies.
Experiments show that our controller achieves robust and real-time control performance in both simulation and the real world.
arXiv Detail & Related papers (2021-12-07T17:39:24Z) - Robustifying Reinforcement Learning Policies with $\mathcal{L}_1$
Adaptive Control [7.025818894763949]
A reinforcement learning (RL) policy could fail in a new/perturbed environment due to the existence of dynamic variations.
We propose an approach to robustifying a pre-trained non-robust RL policy with $mathcalL_1$ adaptive control.
Our approach can significantly improve the robustness of an RL policy trained in a standard (i.e., non-robust) way, either in a simulator or in the real world.
arXiv Detail & Related papers (2021-06-04T04:28:46Z) - Pre-training of Deep RL Agents for Improved Learning under Domain
Randomization [63.09932240840656]
We show how to pre-train a perception encoder that already provides an embedding invariant to the randomization.
We demonstrate this yields consistently improved results on a randomized version of DeepMind control suite tasks and a stacking environment on arbitrary backgrounds with zero-shot transfer to a physical robot.
arXiv Detail & Related papers (2021-04-29T14:54:11Z) - Learning a Contact-Adaptive Controller for Robust, Efficient Legged
Locomotion [95.1825179206694]
We present a framework that synthesizes robust controllers for a quadruped robot.
A high-level controller learns to choose from a set of primitives in response to changes in the environment.
A low-level controller that utilizes an established control method to robustly execute the primitives.
arXiv Detail & Related papers (2020-09-21T16:49:26Z) - Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot
Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO)
We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv Detail & Related papers (2020-02-22T10:15:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.