Related papers: Robust Deep Reinforcement Learning for Quadcopter Control

Robust Deep Reinforcement Learning for Quadcopter Control

URL: http://arxiv.org/abs/2111.03915v1
Date: Sat, 6 Nov 2021 16:35:13 GMT
Title: Robust Deep Reinforcement Learning for Quadcopter Control
Authors: Aditya M. Deshpande, Ali A. Minai, Manish Kumar
Abstract summary: In this work, we use Robust Markov Decision Processes (RMDP) to train the drone control policy. It opts for pessimistic optimization to handle potential gaps between policy transfer from one environment to another. The trained control policy is tested on the task of quadcopter positional control.
Score: 0.8687092759073857
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep reinforcement learning (RL) has made it possible to solve complex robotics problems using neural networks as function approximators. However, the policies trained on stationary environments suffer in terms of generalization when transferred from one environment to another. In this work, we use Robust Markov Decision Processes (RMDP) to train the drone control policy, which combines ideas from Robust Control and RL. It opts for pessimistic optimization to handle potential gaps between policy transfer from one environment to another. The trained control policy is tested on the task of quadcopter positional control. RL agents were trained in a MuJoCo simulator. During testing, different environment parameters (unseen during the training) were used to validate the robustness of the trained policy for transfer from one environment to another. The robust policy outperformed the standard agents in these environments, suggesting that the added robustness increases generality and can adapt to non-stationary environments. Codes: https://github.com/adipandas/gym_multirotor

Related papers

Digi-Q: Learning Q-Value Functions for Training Device-Control Agents [73.60512136881279]
Digi-Q trains VLM-based action-value Q-functions which are then used to extract the agent policy. Digi-Q outperforms several prior methods on user-scale device control tasks in Android-in-the-Wild.
arXiv Detail & Related papers (2025-02-13T18:55:14Z)
Survival of the Fittest: Evolutionary Adaptation of Policies for Environmental Shifts [0.15889427269227555]
We develop an adaptive re-training algorithm inspired by evolutionary game theory (EGT) ERPO shows faster policy adaptation, higher average rewards, and reduced computational costs in policy adaptation.
arXiv Detail & Related papers (2024-10-22T09:29:53Z)
Task and Domain Adaptive Reinforcement Learning for Robot Control [0.34137115855910755]
We present a novel adaptive agent to dynamically adapt policy in response to different tasks and environmental conditions. The agent is trained using a custom, highly parallelized simulator built on IsaacGym. We perform zero-shot transfer to fly the blimp in the real world to solve various tasks.
arXiv Detail & Related papers (2024-04-29T14:02:02Z)
Robot Fleet Learning via Policy Merging [58.5086287737653]
We propose FLEET-MERGE to efficiently merge policies in the fleet setting. We show that FLEET-MERGE consolidates the behavior of policies trained on 50 tasks in the Meta-World environment. We introduce a novel robotic tool-use benchmark, FLEET-TOOLS, for fleet policy learning in compositional and contact-rich robot manipulation tasks.
arXiv Detail & Related papers (2023-10-02T17:23:51Z)
Dichotomy of Control: Separating What You Can Control from What You Cannot [129.62135987416164]
We propose a future-conditioned supervised learning framework that separates mechanisms within a policy's control (actions) from those beyond a policy's control (environmentity) We show that DoC yields policies that are consistent with their conditioning inputs, ensuring that conditioning a learned policy on a desired high-return future outcome will correctly induce high-return behavior.
arXiv Detail & Related papers (2022-10-24T17:49:56Z)
Teaching a Robot to Walk Using Reinforcement Learning [0.0]
reinforcement learning can train optimal walking policies with ease. We teach a simulated two-dimensional bipedal robot how to walk using the OpenAI Gym BipedalWalker-v3 environment. ARS resulted in a better trained robot, and produced an optimal policy which officially "solves" the BipedalWalker-v3 problem.
arXiv Detail & Related papers (2021-12-13T21:35:45Z)
Policy Search for Model Predictive Control with Application to Agile Drone Flight [56.24908013905407]
We propose a policy-search-for-model-predictive-control framework for MPC. Specifically, we formulate the MPC as a parameterized controller, where the hard-to-optimize decision variables are represented as high-level policies. Experiments show that our controller achieves robust and real-time control performance in both simulation and the real world.
arXiv Detail & Related papers (2021-12-07T17:39:24Z)
Robustifying Reinforcement Learning Policies with $\mathcal{L}_1$ Adaptive Control [7.025818894763949]
A reinforcement learning (RL) policy could fail in a new/perturbed environment due to the existence of dynamic variations. We propose an approach to robustifying a pre-trained non-robust RL policy with $mathcalL_1$ adaptive control. Our approach can significantly improve the robustness of an RL policy trained in a standard (i.e., non-robust) way, either in a simulator or in the real world.
arXiv Detail & Related papers (2021-06-04T04:28:46Z)
Pre-training of Deep RL Agents for Improved Learning under Domain Randomization [63.09932240840656]
We show how to pre-train a perception encoder that already provides an embedding invariant to the randomization. We demonstrate this yields consistently improved results on a randomized version of DeepMind control suite tasks and a stacking environment on arbitrary backgrounds with zero-shot transfer to a physical robot.
arXiv Detail & Related papers (2021-04-29T14:54:11Z)
Learning a Contact-Adaptive Controller for Robust, Efficient Legged Locomotion [95.1825179206694]
We present a framework that synthesizes robust controllers for a quadruped robot. A high-level controller learns to choose from a set of primitives in response to changes in the environment. A low-level controller that utilizes an established control method to robustly execute the primitives.
arXiv Detail & Related papers (2020-09-21T16:49:26Z)
Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO) We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv Detail & Related papers (2020-02-22T10:15:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.